1 MVP

We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary

We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!

As part of the MVP we want you not to just run the code but also have a go at intepreting the results and write your thinking in comments in your script.

Hints and tips

  • region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!)
  • Think about whether each variable is categorical or numerical. If categorical, make sure that the variable is represented as a factor.
  • We will not treat this data as a time series, so Date will not be needed in your models, but can you extract any useful features out of Date before you discard it?
  • If you want to build a predictive model, consider using either leaps or glmulti to help with this.

Load libraries:

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## Warning: package 'tibble' was built under R version 3.6.2
## ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
library(modelr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Load dataset and examine it:

avocados <- clean_names(read_csv("data/avocado.csv"))
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   Date = col_date(format = ""),
##   AveragePrice = col_double(),
##   `Total Volume` = col_double(),
##   `4046` = col_double(),
##   `4225` = col_double(),
##   `4770` = col_double(),
##   `Total Bags` = col_double(),
##   `Small Bags` = col_double(),
##   `Large Bags` = col_double(),
##   `XLarge Bags` = col_double(),
##   type = col_character(),
##   year = col_double(),
##   region = col_character()
## )
summary(avocados)
##        x1             date            average_price    total_volume     
##  Min.   : 0.00   Min.   :2015-01-04   Min.   :0.440   Min.   :      85  
##  1st Qu.:10.00   1st Qu.:2015-10-25   1st Qu.:1.100   1st Qu.:   10839  
##  Median :24.00   Median :2016-08-14   Median :1.370   Median :  107377  
##  Mean   :24.23   Mean   :2016-08-13   Mean   :1.406   Mean   :  850644  
##  3rd Qu.:38.00   3rd Qu.:2017-06-04   3rd Qu.:1.660   3rd Qu.:  432962  
##  Max.   :52.00   Max.   :2018-03-25   Max.   :3.250   Max.   :62505647  
##      x4046              x4225              x4770           total_bags      
##  Min.   :       0   Min.   :       0   Min.   :      0   Min.   :       0  
##  1st Qu.:     854   1st Qu.:    3009   1st Qu.:      0   1st Qu.:    5089  
##  Median :    8645   Median :   29061   Median :    185   Median :   39744  
##  Mean   :  293008   Mean   :  295155   Mean   :  22840   Mean   :  239639  
##  3rd Qu.:  111020   3rd Qu.:  150207   3rd Qu.:   6243   3rd Qu.:  110783  
##  Max.   :22743616   Max.   :20470573   Max.   :2546439   Max.   :19373134  
##    small_bags         large_bags       x_large_bags          type          
##  Min.   :       0   Min.   :      0   Min.   :     0.0   Length:18249      
##  1st Qu.:    2849   1st Qu.:    127   1st Qu.:     0.0   Class :character  
##  Median :   26363   Median :   2648   Median :     0.0   Mode  :character  
##  Mean   :  182195   Mean   :  54338   Mean   :  3106.4                     
##  3rd Qu.:   83338   3rd Qu.:  22029   3rd Qu.:   132.5                     
##  Max.   :13384587   Max.   :5719097   Max.   :551693.7                     
##       year         region         
##  Min.   :2015   Length:18249      
##  1st Qu.:2015   Class :character  
##  Median :2016   Mode  :character  
##  Mean   :2016                     
##  3rd Qu.:2017                     
##  Max.   :2018
head(avocados)
avocados %>%
  distinct(region) %>%
  summarise(number_of_regions = n())
avocados %>%
  distinct(date) %>%
  summarise(
    number_of_dates = n(),
    min_date = min(date),
    max_date = max(date)
  )

The x1 variable is related to the database, so we’ll get rid of it. The region variable will lead to many categorical levels, but we can try leaving it in. We should also examine date and perhaps pull out from it whatever features we can.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
trimmed_avocados <- avocados %>%
  mutate(
    quarter = as_factor(quarter(date)),
    year = as_factor(year),
    type = as_factor(type)
  ) %>%
  select(-c("x1", "date"))

Now let’s check for aliased variables (i.e. combinations of variables in which one or more of the variables can be calculated exactly from other variables):

alias(average_price ~ ., data = trimmed_avocados )
## Model :
## average_price ~ total_volume + x4046 + x4225 + x4770 + total_bags + 
##     small_bags + large_bags + x_large_bags + type + year + region + 
##     quarter

Nice, we don’t find any aliases.

1.1 First variable

Run ggpairs() on the remaining variables (leave out region, we’ll boxplot average_price with region next):

trimmed_avocados %>%
  select(-region) %>%
  ggpairs()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Let’s save that plot so we can zoom in on it more easily

ggsave("pairs_plot_choice1.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
trimmed_avocados %>%
  ggplot(aes(x = region, y = average_price)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Test competing models with x4046, type, year, quarter and region:

model1a <- lm(average_price ~ x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1a)

summary(model1a)
## 
## Call:
## lm(formula = average_price ~ x4046, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.98539 -0.29842 -0.03531  0.25459  1.82475 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.425e+00  2.993e-03  476.29   <2e-16 ***
## x4046       -6.631e-08  2.305e-09  -28.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3939 on 18247 degrees of freedom
## Multiple R-squared:  0.0434, Adjusted R-squared:  0.04334 
## F-statistic: 827.8 on 1 and 18247 DF,  p-value: < 2.2e-16
model1b <- lm(average_price ~ type, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1b)

summary(model1b)
## 
## Call:
## lm(formula = average_price ~ type, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21400 -0.20400 -0.02804  0.18600  1.59600 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.158040   0.003321   348.7   <2e-16 ***
## typeorganic 0.495959   0.004697   105.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3173 on 18247 degrees of freedom
## Multiple R-squared:  0.3793, Adjusted R-squared:  0.3792 
## F-statistic: 1.115e+04 on 1 and 18247 DF,  p-value: < 2.2e-16
model1c <- lm(average_price ~ year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1c)

summary(model1c)
## 
## Call:
## lm(formula = average_price ~ year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.07513 -0.29513 -0.03559  0.25247  1.91136 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.375590   0.005280 260.546  < 2e-16 ***
## year2016    -0.036951   0.007466  -4.949 7.52e-07 ***
## year2017     0.139537   0.007432  18.776  < 2e-16 ***
## year2018    -0.028060   0.012192  -2.301   0.0214 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3956 on 18245 degrees of freedom
## Multiple R-squared:  0.03489,    Adjusted R-squared:  0.03474 
## F-statistic: 219.9 on 3 and 18245 DF,  p-value: < 2.2e-16
model1d <- lm(average_price ~ quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1d)

summary(model1d)
## 
## Call:
## lm(formula = average_price ~ quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96859 -0.30503 -0.02859  0.25497  1.79497 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.306605   0.005316 245.769   <2e-16 ***
## quarter2    0.068428   0.008077   8.472   <2e-16 ***
## quarter3    0.206308   0.008076  25.545   <2e-16 ***
## quarter4    0.151983   0.008019  18.952   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3946 on 18245 degrees of freedom
## Multiple R-squared:  0.04006,    Adjusted R-squared:  0.03991 
## F-statistic: 253.8 on 3 and 18245 DF,  p-value: < 2.2e-16
model1e <- lm(average_price ~ region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1e)

summary(model1e)
## 
## Call:
## lm(formula = average_price ~ region, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97095 -0.28423 -0.03432  0.25207  1.76115 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.561036   0.020006  78.029  < 2e-16 ***
## regionAtlanta             -0.223077   0.028293  -7.885 3.33e-15 ***
## regionBaltimoreWashington -0.026805   0.028293  -0.947  0.34344    
## regionBoise               -0.212899   0.028293  -7.525 5.52e-14 ***
## regionBoston              -0.030148   0.028293  -1.066  0.28663    
## regionBuffaloRochester    -0.044201   0.028293  -1.562  0.11824    
## regionCalifornia          -0.165710   0.028293  -5.857 4.79e-09 ***
## regionCharlotte            0.045000   0.028293   1.591  0.11173    
## regionChicago             -0.004260   0.028293  -0.151  0.88031    
## regionCincinnatiDayton    -0.351834   0.028293 -12.436  < 2e-16 ***
## regionColumbus            -0.308254   0.028293 -10.895  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.028293 -16.805  < 2e-16 ***
## regionDenver              -0.342456   0.028293 -12.104  < 2e-16 ***
## regionDetroit             -0.284941   0.028293 -10.071  < 2e-16 ***
## regionGrandRapids         -0.056036   0.028293  -1.981  0.04765 *  
## regionGreatLakes          -0.222485   0.028293  -7.864 3.94e-15 ***
## regionHarrisburgScranton  -0.047751   0.028293  -1.688  0.09147 .  
## regionHartfordSpringfield  0.257604   0.028293   9.105  < 2e-16 ***
## regionHouston             -0.513107   0.028293 -18.136  < 2e-16 ***
## regionIndianapolis        -0.247041   0.028293  -8.732  < 2e-16 ***
## regionJacksonville        -0.050089   0.028293  -1.770  0.07668 .  
## regionLasVegas            -0.180118   0.028293  -6.366 1.98e-10 ***
## regionLosAngeles          -0.345030   0.028293 -12.195  < 2e-16 ***
## regionLouisville          -0.274349   0.028293  -9.697  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.028293  -4.685 2.82e-06 ***
## regionMidsouth            -0.156272   0.028293  -5.523 3.37e-08 ***
## regionNashville           -0.348935   0.028293 -12.333  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.028293  -9.057  < 2e-16 ***
## regionNewYork              0.166538   0.028293   5.886 4.02e-09 ***
## regionNortheast            0.040888   0.028293   1.445  0.14843    
## regionNorthernNewEngland  -0.083639   0.028293  -2.956  0.00312 ** 
## regionOrlando             -0.054822   0.028293  -1.938  0.05268 .  
## regionPhiladelphia         0.071095   0.028293   2.513  0.01199 *  
## regionPhoenixTucson       -0.336598   0.028293 -11.897  < 2e-16 ***
## regionPittsburgh          -0.196716   0.028293  -6.953 3.70e-12 ***
## regionPlains              -0.124527   0.028293  -4.401 1.08e-05 ***
## regionPortland            -0.243314   0.028293  -8.600  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.028293  -0.209  0.83434    
## regionRichmondNorfolk     -0.269704   0.028293  -9.533  < 2e-16 ***
## regionRoanoke             -0.313107   0.028293 -11.067  < 2e-16 ***
## regionSacramento           0.060533   0.028293   2.140  0.03241 *  
## regionSanDiego            -0.162870   0.028293  -5.757 8.72e-09 ***
## regionSanFrancisco         0.243166   0.028293   8.595  < 2e-16 ***
## regionSeattle             -0.118462   0.028293  -4.187 2.84e-05 ***
## regionSouthCarolina       -0.157751   0.028293  -5.576 2.50e-08 ***
## regionSouthCentral        -0.459793   0.028293 -16.251  < 2e-16 ***
## regionSoutheast           -0.163018   0.028293  -5.762 8.45e-09 ***
## regionSpokane             -0.115444   0.028293  -4.080 4.52e-05 ***
## regionStLouis             -0.130414   0.028293  -4.609 4.06e-06 ***
## regionSyracuse            -0.040710   0.028293  -1.439  0.15020    
## regionTampa               -0.152189   0.028293  -5.379 7.58e-08 ***
## regionTotalUS             -0.242012   0.028293  -8.554  < 2e-16 ***
## regionWest                -0.288817   0.028293 -10.208  < 2e-16 ***
## regionWestTexNewMexico    -0.299334   0.028356 -10.556  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3678 on 18195 degrees of freedom
## Multiple R-squared:  0.1681, Adjusted R-squared:  0.1657 
## F-statistic: 69.38 on 53 and 18195 DF,  p-value: < 2.2e-16

model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region).

1.2 Second variable

avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model1b) %>%
  select(-c("average_price", "type", "region"))

ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggsave("pairs_plot_choice2.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
trimmed_avocados %>%
  add_residuals(model1b) %>%
  ggplot(aes(x = region, y = resid)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Looks like x4046, year, quarter and region are our next strong contenders:

model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2a)

summary(model2a)
## 
## Call:
## lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21416 -0.20029 -0.02736  0.18591  1.59589 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.171e+00  3.485e-03  336.13   <2e-16 ***
## typeorganic  4.827e-01  4.802e-03  100.52   <2e-16 ***
## x4046       -2.323e-08  1.898e-09  -12.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.316 on 18246 degrees of freedom
## Multiple R-squared:  0.3843, Adjusted R-squared:  0.3843 
## F-statistic:  5695 on 2 and 18246 DF,  p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2b)

summary(model2b)
## 
## Call:
## lm(formula = average_price ~ type + year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32320 -0.18722 -0.01722  0.18278  1.66337 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.127645   0.004704 239.735  < 2e-16 ***
## typeorganic  0.495980   0.004563 108.685  < 2e-16 ***
## year2016    -0.036995   0.005817  -6.360 2.07e-10 ***
## year2017     0.139580   0.005790  24.107  < 2e-16 ***
## year2018    -0.028104   0.009499  -2.959  0.00309 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3082 on 18244 degrees of freedom
## Multiple R-squared:  0.4142, Adjusted R-squared:  0.4141 
## F-statistic:  3225 on 4 and 18244 DF,  p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2c)

summary(model2c)
## 
## Call:
## lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.11458 -0.20089 -0.02458  0.18542  1.54687 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.058626   0.004718  224.38   <2e-16 ***
## typeorganic 0.495958   0.004543  109.16   <2e-16 ***
## quarter2    0.068546   0.006282   10.91   <2e-16 ***
## quarter3    0.206308   0.006281   32.84   <2e-16 ***
## quarter4    0.152040   0.006237   24.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3069 on 18244 degrees of freedom
## Multiple R-squared:  0.4193, Adjusted R-squared:  0.4192 
## F-statistic:  3294 on 4 and 18244 DF,  p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2d)

summary(model2d)
## 
## Call:
## lm(formula = average_price ~ type + region, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.09858 -0.16716 -0.01814  0.14692  1.51320 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.313079   0.014894  88.159  < 2e-16 ***
## typeorganic                0.495912   0.004017 123.452  < 2e-16 ***
## regionAtlanta             -0.223077   0.020871 -10.688  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.020871  -1.284  0.19906    
## regionBoise               -0.212899   0.020871 -10.201  < 2e-16 ***
## regionBoston              -0.030148   0.020871  -1.444  0.14863    
## regionBuffaloRochester    -0.044201   0.020871  -2.118  0.03421 *  
## regionCalifornia          -0.165710   0.020871  -7.940 2.15e-15 ***
## regionCharlotte            0.045000   0.020871   2.156  0.03109 *  
## regionChicago             -0.004260   0.020871  -0.204  0.83826    
## regionCincinnatiDayton    -0.351834   0.020871 -16.857  < 2e-16 ***
## regionColumbus            -0.308254   0.020871 -14.769  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.020871 -22.780  < 2e-16 ***
## regionDenver              -0.342456   0.020871 -16.408  < 2e-16 ***
## regionDetroit             -0.284941   0.020871 -13.652  < 2e-16 ***
## regionGrandRapids         -0.056036   0.020871  -2.685  0.00726 ** 
## regionGreatLakes          -0.222485   0.020871 -10.660  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.020871  -2.288  0.02216 *  
## regionHartfordSpringfield  0.257604   0.020871  12.342  < 2e-16 ***
## regionHouston             -0.513107   0.020871 -24.584  < 2e-16 ***
## regionIndianapolis        -0.247041   0.020871 -11.836  < 2e-16 ***
## regionJacksonville        -0.050089   0.020871  -2.400  0.01641 *  
## regionLasVegas            -0.180118   0.020871  -8.630  < 2e-16 ***
## regionLosAngeles          -0.345030   0.020871 -16.531  < 2e-16 ***
## regionLouisville          -0.274349   0.020871 -13.145  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.020871  -6.351 2.20e-10 ***
## regionMidsouth            -0.156272   0.020871  -7.487 7.35e-14 ***
## regionNashville           -0.348935   0.020871 -16.718  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.020871 -12.277  < 2e-16 ***
## regionNewYork              0.166538   0.020871   7.979 1.56e-15 ***
## regionNortheast            0.040888   0.020871   1.959  0.05013 .  
## regionNorthernNewEngland  -0.083639   0.020871  -4.007 6.16e-05 ***
## regionOrlando             -0.054822   0.020871  -2.627  0.00863 ** 
## regionPhiladelphia         0.071095   0.020871   3.406  0.00066 ***
## regionPhoenixTucson       -0.336598   0.020871 -16.127  < 2e-16 ***
## regionPittsburgh          -0.196716   0.020871  -9.425  < 2e-16 ***
## regionPlains              -0.124527   0.020871  -5.966 2.47e-09 ***
## regionPortland            -0.243314   0.020871 -11.658  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.020871  -0.284  0.77679    
## regionRichmondNorfolk     -0.269704   0.020871 -12.922  < 2e-16 ***
## regionRoanoke             -0.313107   0.020871 -15.002  < 2e-16 ***
## regionSacramento           0.060533   0.020871   2.900  0.00373 ** 
## regionSanDiego            -0.162870   0.020871  -7.803 6.35e-15 ***
## regionSanFrancisco         0.243166   0.020871  11.651  < 2e-16 ***
## regionSeattle             -0.118462   0.020871  -5.676 1.40e-08 ***
## regionSouthCarolina       -0.157751   0.020871  -7.558 4.28e-14 ***
## regionSouthCentral        -0.459793   0.020871 -22.030  < 2e-16 ***
## regionSoutheast           -0.163018   0.020871  -7.811 6.00e-15 ***
## regionSpokane             -0.115444   0.020871  -5.531 3.22e-08 ***
## regionStLouis             -0.130414   0.020871  -6.248 4.24e-10 ***
## regionSyracuse            -0.040710   0.020871  -1.951  0.05113 .  
## regionTampa               -0.152189   0.020871  -7.292 3.18e-13 ***
## regionTotalUS             -0.242012   0.020871 -11.595  < 2e-16 ***
## regionWest                -0.288817   0.020871 -13.838  < 2e-16 ***
## regionWestTexNewMexico    -0.297114   0.020918 -14.204  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2713 on 18194 degrees of freedom
## Multiple R-squared:  0.5473, Adjusted R-squared:  0.546 
## F-statistic: 407.4 on 54 and 18194 DF,  p-value: < 2.2e-16

So model2d with type and region comes out as better here. We have some region coefficients that are not significant at \(0.05\) level, so let’s run an anova() to test whether to include region

anova(model1b, model2d)

It seems region is significant overall, so we’ll keep it in!

1.3 Third variable

avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model2d) %>%
  select(-c("average_price", "type", "region"))

ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggsave("pairs_plot_choice3.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The next contender variables look to be x_large_bags, year and quarter. Let’s try them out.

model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3a)

summary(model3a)
## 
## Call:
## lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.10024 -0.16726 -0.01734  0.14591  1.51156 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.311e+00  1.489e-02  88.033  < 2e-16 ***
## typeorganic                5.001e-01  4.101e-03 121.953  < 2e-16 ***
## regionAtlanta             -2.235e-01  2.086e-02 -10.718  < 2e-16 ***
## regionBaltimoreWashington -2.713e-02  2.086e-02  -1.301 0.193298    
## regionBoise               -2.128e-01  2.086e-02 -10.204  < 2e-16 ***
## regionBoston              -3.023e-02  2.086e-02  -1.449 0.147234    
## regionBuffaloRochester    -4.428e-02  2.086e-02  -2.123 0.033774 *  
## regionCalifornia          -1.762e-01  2.096e-02  -8.408  < 2e-16 ***
## regionCharlotte            4.495e-02  2.086e-02   2.155 0.031177 *  
## regionChicago             -4.936e-03  2.086e-02  -0.237 0.812924    
## regionCincinnatiDayton    -3.523e-01  2.086e-02 -16.890  < 2e-16 ***
## regionColumbus            -3.086e-01  2.086e-02 -14.796  < 2e-16 ***
## regionDallasFtWorth       -4.762e-01  2.086e-02 -22.832  < 2e-16 ***
## regionDenver              -3.425e-01  2.086e-02 -16.420  < 2e-16 ***
## regionDetroit             -2.882e-01  2.087e-02 -13.810  < 2e-16 ***
## regionGrandRapids         -5.764e-02  2.086e-02  -2.763 0.005731 ** 
## regionGreatLakes          -2.353e-01  2.101e-02 -11.198  < 2e-16 ***
## regionHarrisburgScranton  -4.798e-02  2.086e-02  -2.300 0.021451 *  
## regionHartfordSpringfield  2.575e-01  2.086e-02  12.347  < 2e-16 ***
## regionHouston             -5.137e-01  2.086e-02 -24.628  < 2e-16 ***
## regionIndianapolis        -2.475e-01  2.086e-02 -11.867  < 2e-16 ***
## regionJacksonville        -5.021e-02  2.086e-02  -2.407 0.016074 *  
## regionLasVegas            -1.801e-01  2.086e-02  -8.633  < 2e-16 ***
## regionLosAngeles          -3.532e-01  2.092e-02 -16.881  < 2e-16 ***
## regionLouisville          -2.745e-01  2.086e-02 -13.160  < 2e-16 ***
## regionMiamiFtLauderdale   -1.331e-01  2.086e-02  -6.380 1.81e-10 ***
## regionMidsouth            -1.590e-01  2.086e-02  -7.619 2.68e-14 ***
## regionNashville           -3.491e-01  2.086e-02 -16.736  < 2e-16 ***
## regionNewOrleansMobile    -2.572e-01  2.086e-02 -12.330  < 2e-16 ***
## regionNewYork              1.659e-01  2.086e-02   7.954 1.91e-15 ***
## regionNortheast            3.834e-02  2.086e-02   1.838 0.066151 .  
## regionNorthernNewEngland  -8.377e-02  2.086e-02  -4.017 5.93e-05 ***
## regionOrlando             -5.523e-02  2.086e-02  -2.648 0.008111 ** 
## regionPhiladelphia         7.097e-02  2.086e-02   3.403 0.000669 ***
## regionPhoenixTucson       -3.368e-01  2.086e-02 -16.149  < 2e-16 ***
## regionPittsburgh          -1.967e-01  2.086e-02  -9.433  < 2e-16 ***
## regionPlains              -1.267e-01  2.086e-02  -6.072 1.29e-09 ***
## regionPortland            -2.434e-01  2.086e-02 -11.669  < 2e-16 ***
## regionRaleighGreensboro   -6.021e-03  2.086e-02  -0.289 0.772828    
## regionRichmondNorfolk     -2.699e-01  2.086e-02 -12.939  < 2e-16 ***
## regionRoanoke             -3.132e-01  2.086e-02 -15.015  < 2e-16 ***
## regionSacramento           6.020e-02  2.086e-02   2.886 0.003904 ** 
## regionSanDiego            -1.631e-01  2.086e-02  -7.819 5.64e-15 ***
## regionSanFrancisco         2.428e-01  2.086e-02  11.642  < 2e-16 ***
## regionSeattle             -1.185e-01  2.086e-02  -5.682 1.35e-08 ***
## regionSouthCarolina       -1.581e-01  2.086e-02  -7.581 3.59e-14 ***
## regionSouthCentral        -4.650e-01  2.088e-02 -22.268  < 2e-16 ***
## regionSoutheast           -1.680e-01  2.088e-02  -8.046 9.10e-16 ***
## regionSpokane             -1.154e-01  2.086e-02  -5.531 3.22e-08 ***
## regionStLouis             -1.308e-01  2.086e-02  -6.270 3.69e-10 ***
## regionSyracuse            -4.071e-02  2.086e-02  -1.952 0.050993 .  
## regionTampa               -1.526e-01  2.086e-02  -7.315 2.68e-13 ***
## regionTotalUS             -2.852e-01  2.255e-02 -12.648  < 2e-16 ***
## regionWest                -2.904e-01  2.086e-02 -13.922  < 2e-16 ***
## regionWestTexNewMexico    -2.976e-01  2.090e-02 -14.238  < 2e-16 ***
## x_large_bags               6.810e-07  1.351e-07   5.040 4.70e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2711 on 18193 degrees of freedom
## Multiple R-squared:  0.548,  Adjusted R-squared:  0.5466 
## F-statistic:   401 on 55 and 18193 DF,  p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3b)

summary(model3b)
## 
## Call:
## lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1532 -0.1497 -0.0060  0.1419  1.4849 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.282672   0.014600  87.857  < 2e-16 ***
## typeorganic                0.495933   0.003859 128.501  < 2e-16 ***
## regionAtlanta             -0.223077   0.020052 -11.125  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.020052  -1.337 0.181322    
## regionBoise               -0.212899   0.020052 -10.617  < 2e-16 ***
## regionBoston              -0.030148   0.020052  -1.503 0.132735    
## regionBuffaloRochester    -0.044201   0.020052  -2.204 0.027515 *  
## regionCalifornia          -0.165710   0.020052  -8.264  < 2e-16 ***
## regionCharlotte            0.045000   0.020052   2.244 0.024835 *  
## regionChicago             -0.004260   0.020052  -0.212 0.831748    
## regionCincinnatiDayton    -0.351834   0.020052 -17.546  < 2e-16 ***
## regionColumbus            -0.308254   0.020052 -15.373  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.020052 -23.710  < 2e-16 ***
## regionDenver              -0.342456   0.020052 -17.078  < 2e-16 ***
## regionDetroit             -0.284941   0.020052 -14.210  < 2e-16 ***
## regionGrandRapids         -0.056036   0.020052  -2.794 0.005204 ** 
## regionGreatLakes          -0.222485   0.020052 -11.095  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.020052  -2.381 0.017259 *  
## regionHartfordSpringfield  0.257604   0.020052  12.847  < 2e-16 ***
## regionHouston             -0.513107   0.020052 -25.589  < 2e-16 ***
## regionIndianapolis        -0.247041   0.020052 -12.320  < 2e-16 ***
## regionJacksonville        -0.050089   0.020052  -2.498 0.012501 *  
## regionLasVegas            -0.180118   0.020052  -8.982  < 2e-16 ***
## regionLosAngeles          -0.345030   0.020052 -17.207  < 2e-16 ***
## regionLouisville          -0.274349   0.020052 -13.682  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.020052  -6.610 3.95e-11 ***
## regionMidsouth            -0.156272   0.020052  -7.793 6.88e-15 ***
## regionNashville           -0.348935   0.020052 -17.401  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.020052 -12.779  < 2e-16 ***
## regionNewYork              0.166538   0.020052   8.305  < 2e-16 ***
## regionNortheast            0.040888   0.020052   2.039 0.041459 *  
## regionNorthernNewEngland  -0.083639   0.020052  -4.171 3.05e-05 ***
## regionOrlando             -0.054822   0.020052  -2.734 0.006263 ** 
## regionPhiladelphia         0.071095   0.020052   3.545 0.000393 ***
## regionPhoenixTucson       -0.336598   0.020052 -16.786  < 2e-16 ***
## regionPittsburgh          -0.196716   0.020052  -9.810  < 2e-16 ***
## regionPlains              -0.124527   0.020052  -6.210 5.41e-10 ***
## regionPortland            -0.243314   0.020052 -12.134  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.020052  -0.295 0.767930    
## regionRichmondNorfolk     -0.269704   0.020052 -13.450  < 2e-16 ***
## regionRoanoke             -0.313107   0.020052 -15.615  < 2e-16 ***
## regionSacramento           0.060533   0.020052   3.019 0.002542 ** 
## regionSanDiego            -0.162870   0.020052  -8.122 4.86e-16 ***
## regionSanFrancisco         0.243166   0.020052  12.127  < 2e-16 ***
## regionSeattle             -0.118462   0.020052  -5.908 3.53e-09 ***
## regionSouthCarolina       -0.157751   0.020052  -7.867 3.83e-15 ***
## regionSouthCentral        -0.459793   0.020052 -22.930  < 2e-16 ***
## regionSoutheast           -0.163018   0.020052  -8.130 4.58e-16 ***
## regionSpokane             -0.115444   0.020052  -5.757 8.69e-09 ***
## regionStLouis             -0.130414   0.020052  -6.504 8.04e-11 ***
## regionSyracuse            -0.040710   0.020052  -2.030 0.042350 *  
## regionTampa               -0.152189   0.020052  -7.590 3.36e-14 ***
## regionTotalUS             -0.242012   0.020052 -12.069  < 2e-16 ***
## regionWest                -0.288817   0.020052 -14.403  < 2e-16 ***
## regionWestTexNewMexico    -0.296552   0.020097 -14.756  < 2e-16 ***
## year2016                  -0.036970   0.004920  -7.515 5.96e-14 ***
## year2017                   0.139555   0.004897  28.500  < 2e-16 ***
## year2018                  -0.028078   0.008033  -3.495 0.000475 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2607 on 18191 degrees of freedom
## Multiple R-squared:  0.5822, Adjusted R-squared:  0.5809 
## F-statistic: 444.8 on 57 and 18191 DF,  p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3c)

summary(model3c)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06767 -0.15971 -0.01185  0.14629  1.54411 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.213689   0.014517  83.603  < 2e-16 ***
## typeorganic                0.495911   0.003835 129.296  < 2e-16 ***
## regionAtlanta             -0.223077   0.019928 -11.194  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.019928  -1.345 0.178619    
## regionBoise               -0.212899   0.019928 -10.683  < 2e-16 ***
## regionBoston              -0.030148   0.019928  -1.513 0.130339    
## regionBuffaloRochester    -0.044201   0.019928  -2.218 0.026565 *  
## regionCalifornia          -0.165710   0.019928  -8.315  < 2e-16 ***
## regionCharlotte            0.045000   0.019928   2.258 0.023950 *  
## regionChicago             -0.004260   0.019928  -0.214 0.830716    
## regionCincinnatiDayton    -0.351834   0.019928 -17.655  < 2e-16 ***
## regionColumbus            -0.308254   0.019928 -15.468  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.019928 -23.858  < 2e-16 ***
## regionDenver              -0.342456   0.019928 -17.185  < 2e-16 ***
## regionDetroit             -0.284941   0.019928 -14.298  < 2e-16 ***
## regionGrandRapids         -0.056036   0.019928  -2.812 0.004931 ** 
## regionGreatLakes          -0.222485   0.019928 -11.164  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.019928  -2.396 0.016577 *  
## regionHartfordSpringfield  0.257604   0.019928  12.927  < 2e-16 ***
## regionHouston             -0.513107   0.019928 -25.748  < 2e-16 ***
## regionIndianapolis        -0.247041   0.019928 -12.397  < 2e-16 ***
## regionJacksonville        -0.050089   0.019928  -2.513 0.011963 *  
## regionLasVegas            -0.180118   0.019928  -9.038  < 2e-16 ***
## regionLosAngeles          -0.345030   0.019928 -17.314  < 2e-16 ***
## regionLouisville          -0.274349   0.019928 -13.767  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.019928  -6.651 2.99e-11 ***
## regionMidsouth            -0.156272   0.019928  -7.842 4.69e-15 ***
## regionNashville           -0.348935   0.019928 -17.510  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.019928 -12.858  < 2e-16 ***
## regionNewYork              0.166538   0.019928   8.357  < 2e-16 ***
## regionNortheast            0.040888   0.019928   2.052 0.040208 *  
## regionNorthernNewEngland  -0.083639   0.019928  -4.197 2.72e-05 ***
## regionOrlando             -0.054822   0.019928  -2.751 0.005947 ** 
## regionPhiladelphia         0.071095   0.019928   3.568 0.000361 ***
## regionPhoenixTucson       -0.336598   0.019928 -16.891  < 2e-16 ***
## regionPittsburgh          -0.196716   0.019928  -9.871  < 2e-16 ***
## regionPlains              -0.124527   0.019928  -6.249 4.23e-10 ***
## regionPortland            -0.243314   0.019928 -12.210  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.019928  -0.297 0.766527    
## regionRichmondNorfolk     -0.269704   0.019928 -13.534  < 2e-16 ***
## regionRoanoke             -0.313107   0.019928 -15.712  < 2e-16 ***
## regionSacramento           0.060533   0.019928   3.038 0.002389 ** 
## regionSanDiego            -0.162870   0.019928  -8.173 3.21e-16 ***
## regionSanFrancisco         0.243166   0.019928  12.202  < 2e-16 ***
## regionSeattle             -0.118462   0.019928  -5.944 2.82e-09 ***
## regionSouthCarolina       -0.157751   0.019928  -7.916 2.59e-15 ***
## regionSouthCentral        -0.459793   0.019928 -23.073  < 2e-16 ***
## regionSoutheast           -0.163018   0.019928  -8.180 3.02e-16 ***
## regionSpokane             -0.115444   0.019928  -5.793 7.03e-09 ***
## regionStLouis             -0.130414   0.019928  -6.544 6.14e-11 ***
## regionSyracuse            -0.040710   0.019928  -2.043 0.041082 *  
## regionTampa               -0.152189   0.019928  -7.637 2.33e-14 ***
## regionTotalUS             -0.242012   0.019928 -12.144  < 2e-16 ***
## regionWest                -0.288817   0.019928 -14.493  < 2e-16 ***
## regionWestTexNewMexico    -0.297141   0.019973 -14.877  < 2e-16 ***
## quarter2                   0.068479   0.005303  12.912  < 2e-16 ***
## quarter3                   0.206308   0.005303  38.906  < 2e-16 ***
## quarter4                   0.152007   0.005265  28.869  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2591 on 18191 degrees of freedom
## Multiple R-squared:  0.5874, Adjusted R-squared:  0.5861 
## F-statistic: 454.3 on 57 and 18191 DF,  p-value: < 2.2e-16

So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.

1.4 Fourth variable

avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model3c) %>%
  select(-c("average_price", "type", "region", "quarter"))

ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggsave("pairs_plot_choice4.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The contender variables here are x_large_bags and year, so let’s try them out.

model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4a)

summary(model4a)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + x_large_bags, 
##     data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06889 -0.16013 -0.01154  0.14553  1.54291 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.212e+00  1.451e-02  83.493  < 2e-16 ***
## typeorganic                4.998e-01  3.916e-03 127.614  < 2e-16 ***
## regionAtlanta             -2.235e-01  1.992e-02 -11.222  < 2e-16 ***
## regionBaltimoreWashington -2.711e-02  1.992e-02  -1.361 0.173535    
## regionBoise               -2.128e-01  1.992e-02 -10.687  < 2e-16 ***
## regionBoston              -3.022e-02  1.992e-02  -1.518 0.129137    
## regionBuffaloRochester    -4.427e-02  1.992e-02  -2.223 0.026233 *  
## regionCalifornia          -1.753e-01  2.002e-02  -8.759  < 2e-16 ***
## regionCharlotte            4.495e-02  1.992e-02   2.257 0.024015 *  
## regionChicago             -4.877e-03  1.992e-02  -0.245 0.806549    
## regionCincinnatiDayton    -3.522e-01  1.992e-02 -17.686  < 2e-16 ***
## regionColumbus            -3.086e-01  1.992e-02 -15.494  < 2e-16 ***
## regionDallasFtWorth       -4.762e-01  1.992e-02 -23.908  < 2e-16 ***
## regionDenver              -3.425e-01  1.992e-02 -17.196  < 2e-16 ***
## regionDetroit             -2.879e-01  1.993e-02 -14.449  < 2e-16 ***
## regionGrandRapids         -5.750e-02  1.992e-02  -2.887 0.003898 ** 
## regionGreatLakes          -2.342e-01  2.006e-02 -11.671  < 2e-16 ***
## regionHarrisburgScranton  -4.796e-02  1.992e-02  -2.408 0.016054 *  
## regionHartfordSpringfield  2.575e-01  1.992e-02  12.931  < 2e-16 ***
## regionHouston             -5.136e-01  1.992e-02 -25.789  < 2e-16 ***
## regionIndianapolis        -2.475e-01  1.992e-02 -12.426  < 2e-16 ***
## regionJacksonville        -5.020e-02  1.992e-02  -2.521 0.011720 *  
## regionLasVegas            -1.801e-01  1.992e-02  -9.041  < 2e-16 ***
## regionLosAngeles          -3.524e-01  1.998e-02 -17.644  < 2e-16 ***
## regionLouisville          -2.745e-01  1.992e-02 -13.781  < 2e-16 ***
## regionMiamiFtLauderdale   -1.330e-01  1.992e-02  -6.679 2.47e-11 ***
## regionMidsouth            -1.587e-01  1.992e-02  -7.967 1.72e-15 ***
## regionNashville           -3.491e-01  1.992e-02 -17.527  < 2e-16 ***
## regionNewOrleansMobile    -2.571e-01  1.992e-02 -12.909  < 2e-16 ***
## regionNewYork              1.660e-01  1.992e-02   8.333  < 2e-16 ***
## regionNortheast            3.856e-02  1.992e-02   1.936 0.052939 .  
## regionNorthernNewEngland  -8.376e-02  1.992e-02  -4.206 2.61e-05 ***
## regionOrlando             -5.519e-02  1.992e-02  -2.771 0.005592 ** 
## regionPhiladelphia         7.098e-02  1.992e-02   3.564 0.000366 ***
## regionPhoenixTucson       -3.368e-01  1.992e-02 -16.911  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.992e-02  -9.879  < 2e-16 ***
## regionPlains              -1.265e-01  1.992e-02  -6.350 2.20e-10 ***
## regionPortland            -2.434e-01  1.992e-02 -12.220  < 2e-16 ***
## regionRaleighGreensboro   -6.012e-03  1.992e-02  -0.302 0.762753    
## regionRichmondNorfolk     -2.699e-01  1.992e-02 -13.549  < 2e-16 ***
## regionRoanoke             -3.132e-01  1.992e-02 -15.725  < 2e-16 ***
## regionSacramento           6.023e-02  1.992e-02   3.024 0.002497 ** 
## regionSanDiego            -1.631e-01  1.992e-02  -8.187 2.85e-16 ***
## regionSanFrancisco         2.429e-01  1.992e-02  12.194  < 2e-16 ***
## regionSeattle             -1.185e-01  1.992e-02  -5.950 2.72e-09 ***
## regionSouthCarolina       -1.581e-01  1.992e-02  -7.938 2.18e-15 ***
## regionSouthCentral        -4.646e-01  1.994e-02 -23.297  < 2e-16 ***
## regionSoutheast           -1.676e-01  1.994e-02  -8.404  < 2e-16 ***
## regionSpokane             -1.154e-01  1.992e-02  -5.793 7.02e-09 ***
## regionStLouis             -1.307e-01  1.992e-02  -6.565 5.35e-11 ***
## regionSyracuse            -4.071e-02  1.992e-02  -2.044 0.040974 *  
## regionTampa               -1.525e-01  1.992e-02  -7.659 1.96e-14 ***
## regionTotalUS             -2.814e-01  2.153e-02 -13.068  < 2e-16 ***
## regionWest                -2.903e-01  1.992e-02 -14.573  < 2e-16 ***
## regionWestTexNewMexico    -2.976e-01  1.996e-02 -14.910  < 2e-16 ***
## quarter2                   6.806e-02  5.301e-03  12.839  < 2e-16 ***
## quarter3                   2.055e-01  5.302e-03  38.761  < 2e-16 ***
## quarter4                   1.527e-01  5.264e-03  29.001  < 2e-16 ***
## x_large_bags               6.215e-07  1.292e-07   4.810 1.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2589 on 18190 degrees of freedom
## Multiple R-squared:  0.5879, Adjusted R-squared:  0.5866 
## F-statistic: 447.4 on 58 and 18190 DF,  p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4b)

summary(model4b)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year, 
##     data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03683 -0.14588 -0.00412  0.14386  1.43930 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.167184   0.014290  81.677  < 2e-16 ***
## typeorganic                0.495930   0.003675 134.950  < 2e-16 ***
## regionAtlanta             -0.223077   0.019094 -11.683  < 2e-16 ***
## regionBaltimoreWashington -0.026805   0.019094  -1.404 0.160383    
## regionBoise               -0.212899   0.019094 -11.150  < 2e-16 ***
## regionBoston              -0.030148   0.019094  -1.579 0.114368    
## regionBuffaloRochester    -0.044201   0.019094  -2.315 0.020627 *  
## regionCalifornia          -0.165710   0.019094  -8.679  < 2e-16 ***
## regionCharlotte            0.045000   0.019094   2.357 0.018445 *  
## regionChicago             -0.004260   0.019094  -0.223 0.823439    
## regionCincinnatiDayton    -0.351834   0.019094 -18.427  < 2e-16 ***
## regionColumbus            -0.308254   0.019094 -16.144  < 2e-16 ***
## regionDallasFtWorth       -0.475444   0.019094 -24.900  < 2e-16 ***
## regionDenver              -0.342456   0.019094 -17.935  < 2e-16 ***
## regionDetroit             -0.284941   0.019094 -14.923  < 2e-16 ***
## regionGrandRapids         -0.056036   0.019094  -2.935 0.003342 ** 
## regionGreatLakes          -0.222485   0.019094 -11.652  < 2e-16 ***
## regionHarrisburgScranton  -0.047751   0.019094  -2.501 0.012397 *  
## regionHartfordSpringfield  0.257604   0.019094  13.491  < 2e-16 ***
## regionHouston             -0.513107   0.019094 -26.873  < 2e-16 ***
## regionIndianapolis        -0.247041   0.019094 -12.938  < 2e-16 ***
## regionJacksonville        -0.050089   0.019094  -2.623 0.008716 ** 
## regionLasVegas            -0.180118   0.019094  -9.433  < 2e-16 ***
## regionLosAngeles          -0.345030   0.019094 -18.070  < 2e-16 ***
## regionLouisville          -0.274349   0.019094 -14.368  < 2e-16 ***
## regionMiamiFtLauderdale   -0.132544   0.019094  -6.942 4.00e-12 ***
## regionMidsouth            -0.156272   0.019094  -8.184 2.91e-16 ***
## regionNashville           -0.348935   0.019094 -18.275  < 2e-16 ***
## regionNewOrleansMobile    -0.256243   0.019094 -13.420  < 2e-16 ***
## regionNewYork              0.166538   0.019094   8.722  < 2e-16 ***
## regionNortheast            0.040888   0.019094   2.141 0.032255 *  
## regionNorthernNewEngland  -0.083639   0.019094  -4.380 1.19e-05 ***
## regionOrlando             -0.054822   0.019094  -2.871 0.004094 ** 
## regionPhiladelphia         0.071095   0.019094   3.723 0.000197 ***
## regionPhoenixTucson       -0.336598   0.019094 -17.629  < 2e-16 ***
## regionPittsburgh          -0.196716   0.019094 -10.303  < 2e-16 ***
## regionPlains              -0.124527   0.019094  -6.522 7.13e-11 ***
## regionPortland            -0.243314   0.019094 -12.743  < 2e-16 ***
## regionRaleighGreensboro   -0.005917   0.019094  -0.310 0.756641    
## regionRichmondNorfolk     -0.269704   0.019094 -14.125  < 2e-16 ***
## regionRoanoke             -0.313107   0.019094 -16.398  < 2e-16 ***
## regionSacramento           0.060533   0.019094   3.170 0.001526 ** 
## regionSanDiego            -0.162870   0.019094  -8.530  < 2e-16 ***
## regionSanFrancisco         0.243166   0.019094  12.735  < 2e-16 ***
## regionSeattle             -0.118462   0.019094  -6.204 5.62e-10 ***
## regionSouthCarolina       -0.157751   0.019094  -8.262  < 2e-16 ***
## regionSouthCentral        -0.459793   0.019094 -24.081  < 2e-16 ***
## regionSoutheast           -0.163018   0.019094  -8.538  < 2e-16 ***
## regionSpokane             -0.115444   0.019094  -6.046 1.51e-09 ***
## regionStLouis             -0.130414   0.019094  -6.830 8.75e-12 ***
## regionSyracuse            -0.040710   0.019094  -2.132 0.033011 *  
## regionTampa               -0.152189   0.019094  -7.971 1.67e-15 ***
## regionTotalUS             -0.242012   0.019094 -12.675  < 2e-16 ***
## regionWest                -0.288817   0.019094 -15.126  < 2e-16 ***
## regionWestTexNewMexico    -0.296624   0.019137 -15.500  < 2e-16 ***
## quarter2                   0.081121   0.005410  14.996  < 2e-16 ***
## quarter3                   0.218901   0.005409  40.471  < 2e-16 ***
## quarter4                   0.161972   0.005376  30.130  < 2e-16 ***
## year2016                  -0.036978   0.004684  -7.894 3.10e-15 ***
## year2017                   0.138658   0.004663  29.735  < 2e-16 ***
## year2018                   0.087412   0.008334  10.488  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2482 on 18188 degrees of freedom
## Multiple R-squared:  0.6213, Adjusted R-squared:   0.62 
## F-statistic: 497.3 on 60 and 18188 DF,  p-value: < 2.2e-16

Hmm, model4b with type, region, quarter and year wins here

1.5 Fifth variable

We are likely now pursuing variables with rather limited explanatory power, but let’s check for one more main effect.

avocados_remaining_resid <- trimmed_avocados %>%
  add_residuals(model4b) %>%
  select(-c("average_price", "type", "region", "quarter", "year"))

ggpairs(avocados_remaining_resid)

ggsave("pairs_plot_choice5.png", width = 10, height = 10, units = "in")

It looks like x_large_bags is the remaining contender, let’s check it out!

model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5)

summary(model5)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03610 -0.14545 -0.00439  0.14420  1.43907 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.167e+00  1.429e-02  81.687  < 2e-16 ***
## typeorganic                4.982e-01  3.755e-03 132.674  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.909e-02 -11.698  < 2e-16 ***
## regionBaltimoreWashington -2.698e-02  1.909e-02  -1.413 0.157614    
## regionBoise               -2.129e-01  1.909e-02 -11.151  < 2e-16 ***
## regionBoston              -3.019e-02  1.909e-02  -1.582 0.113769    
## regionBuffaloRochester    -4.424e-02  1.909e-02  -2.318 0.020485 *  
## regionCalifornia          -1.713e-01  1.919e-02  -8.925  < 2e-16 ***
## regionCharlotte            4.497e-02  1.909e-02   2.356 0.018493 *  
## regionChicago             -4.616e-03  1.909e-02  -0.242 0.808941    
## regionCincinnatiDayton    -3.521e-01  1.909e-02 -18.442  < 2e-16 ***
## regionColumbus            -3.084e-01  1.909e-02 -16.157  < 2e-16 ***
## regionDallasFtWorth       -4.759e-01  1.909e-02 -24.926  < 2e-16 ***
## regionDenver              -3.425e-01  1.909e-02 -17.940  < 2e-16 ***
## regionDetroit             -2.866e-01  1.910e-02 -15.008  < 2e-16 ***
## regionGrandRapids         -5.688e-02  1.909e-02  -2.979 0.002894 ** 
## regionGreatLakes          -2.292e-01  1.923e-02 -11.918  < 2e-16 ***
## regionHarrisburgScranton  -4.787e-02  1.909e-02  -2.508 0.012166 *  
## regionHartfordSpringfield  2.576e-01  1.909e-02  13.492  < 2e-16 ***
## regionHouston             -5.134e-01  1.909e-02 -26.894  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.909e-02 -12.954  < 2e-16 ***
## regionJacksonville        -5.015e-02  1.909e-02  -2.627 0.008615 ** 
## regionLasVegas            -1.801e-01  1.909e-02  -9.434  < 2e-16 ***
## regionLosAngeles          -3.493e-01  1.915e-02 -18.243  < 2e-16 ***
## regionLouisville          -2.744e-01  1.909e-02 -14.375  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.909e-02  -6.958 3.58e-12 ***
## regionMidsouth            -1.577e-01  1.910e-02  -8.257  < 2e-16 ***
## regionNashville           -3.490e-01  1.909e-02 -18.282  < 2e-16 ***
## regionNewOrleansMobile    -2.567e-01  1.909e-02 -13.448  < 2e-16 ***
## regionNewYork              1.662e-01  1.909e-02   8.706  < 2e-16 ***
## regionNortheast            3.955e-02  1.910e-02   2.071 0.038381 *  
## regionNorthernNewEngland  -8.371e-02  1.909e-02  -4.385 1.17e-05 ***
## regionOrlando             -5.503e-02  1.909e-02  -2.883 0.003945 ** 
## regionPhiladelphia         7.103e-02  1.909e-02   3.721 0.000199 ***
## regionPhoenixTucson       -3.367e-01  1.909e-02 -17.638  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.909e-02 -10.305  < 2e-16 ***
## regionPlains              -1.257e-01  1.909e-02  -6.581 4.80e-11 ***
## regionPortland            -2.434e-01  1.909e-02 -12.748  < 2e-16 ***
## regionRaleighGreensboro   -5.972e-03  1.909e-02  -0.313 0.754415    
## regionRichmondNorfolk     -2.698e-01  1.909e-02 -14.132  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.909e-02 -16.404  < 2e-16 ***
## regionSacramento           6.036e-02  1.909e-02   3.162 0.001571 ** 
## regionSanDiego            -1.630e-01  1.909e-02  -8.537  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.909e-02  12.728  < 2e-16 ***
## regionSeattle             -1.185e-01  1.909e-02  -6.207 5.52e-10 ***
## regionSouthCarolina       -1.579e-01  1.909e-02  -8.274  < 2e-16 ***
## regionSouthCentral        -4.625e-01  1.911e-02 -24.199  < 2e-16 ***
## regionSoutheast           -1.656e-01  1.911e-02  -8.667  < 2e-16 ***
## regionSpokane             -1.154e-01  1.909e-02  -6.045 1.52e-09 ***
## regionStLouis             -1.306e-01  1.909e-02  -6.842 8.08e-12 ***
## regionSyracuse            -4.071e-02  1.909e-02  -2.132 0.032984 *  
## regionTampa               -1.524e-01  1.909e-02  -7.983 1.52e-15 ***
## regionTotalUS             -2.647e-01  2.066e-02 -12.815  < 2e-16 ***
## regionWest                -2.897e-01  1.909e-02 -15.171  < 2e-16 ***
## regionWestTexNewMexico    -2.969e-01  1.913e-02 -15.518  < 2e-16 ***
## quarter2                   8.058e-02  5.412e-03  14.891  < 2e-16 ***
## quarter3                   2.181e-01  5.414e-03  40.293  < 2e-16 ***
## quarter4                   1.621e-01  5.375e-03  30.154  < 2e-16 ***
## year2016                  -3.791e-02  4.695e-03  -8.075 7.16e-16 ***
## year2017                   1.375e-01  4.680e-03  29.381  < 2e-16 ***
## year2018                   8.547e-02  8.360e-03  10.223  < 2e-16 ***
## x_large_bags               3.583e-07  1.246e-07   2.877 0.004025 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2482 on 18187 degrees of freedom
## Multiple R-squared:  0.6214, Adjusted R-squared:  0.6202 
## F-statistic: 489.4 on 61 and 18187 DF,  p-value: < 2.2e-16

It is a significant explanatory variable, so let’s keep it. Overall, we still have some heterscedasticity and deviations from normality in the residuals.

1.6 Pair interaction

Let’s now think about possible pair interactions: for five main effect variables we have ten possible pair interactions. Let’s test them out.

model5pa <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pa)

summary(model5pa)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + type:region, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00812 -0.13347 -0.00249  0.13359  1.48016 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            1.203e+00  1.855e-02  64.874  < 2e-16
## typeorganic                            4.246e-01  2.558e-02  16.598  < 2e-16
## regionAtlanta                         -2.801e-01  2.558e-02 -10.950  < 2e-16
## regionBaltimoreWashington             -4.684e-03  2.558e-02  -0.183 0.854724
## regionBoise                           -2.727e-01  2.558e-02 -10.660  < 2e-16
## regionBoston                          -4.441e-02  2.558e-02  -1.736 0.082557
## regionBuffaloRochester                 3.352e-02  2.558e-02   1.310 0.190080
## regionCalifornia                      -2.474e-01  2.600e-02  -9.516  < 2e-16
## regionCharlotte                       -7.369e-02  2.558e-02  -2.881 0.003973
## regionChicago                          2.033e-02  2.558e-02   0.795 0.426797
## regionCincinnatiDayton                -3.334e-01  2.558e-02 -13.034  < 2e-16
## regionColumbus                        -2.826e-01  2.558e-02 -11.048  < 2e-16
## regionDallasFtWorth                   -5.026e-01  2.558e-02 -19.647  < 2e-16
## regionDenver                          -2.748e-01  2.558e-02 -10.743  < 2e-16
## regionDetroit                         -2.260e-01  2.562e-02  -8.823  < 2e-16
## regionGrandRapids                     -2.435e-02  2.559e-02  -0.951 0.341382
## regionGreatLakes                      -1.718e-01  2.619e-02  -6.560 5.54e-11
## regionHarrisburgScranton              -9.003e-02  2.558e-02  -3.519 0.000434
## regionHartfordSpringfield              5.926e-02  2.558e-02   2.317 0.020528
## regionHouston                         -5.239e-01  2.558e-02 -20.479  < 2e-16
## regionIndianapolis                    -2.041e-01  2.558e-02  -7.978 1.57e-15
## regionJacksonville                    -1.552e-01  2.558e-02  -6.067 1.33e-09
## regionLasVegas                        -3.358e-01  2.558e-02 -13.126  < 2e-16
## regionLosAngeles                      -3.755e-01  2.583e-02 -14.536  < 2e-16
## regionLouisville                      -2.435e-01  2.558e-02  -9.518  < 2e-16
## regionMiamiFtLauderdale               -9.464e-02  2.558e-02  -3.700 0.000217
## regionMidsouth                        -1.426e-01  2.561e-02  -5.570 2.58e-08
## regionNashville                       -3.359e-01  2.558e-02 -13.132  < 2e-16
## regionNewOrleansMobile                -2.639e-01  2.558e-02 -10.313  < 2e-16
## regionNewYork                          5.313e-02  2.558e-02   2.077 0.037842
## regionNortheast                       -5.307e-03  2.560e-02  -0.207 0.835817
## regionNorthernNewEngland              -8.857e-02  2.558e-02  -3.463 0.000536
## regionOrlando                         -1.345e-01  2.558e-02  -5.257 1.48e-07
## regionPhiladelphia                     4.753e-02  2.558e-02   1.858 0.063204
## regionPhoenixTucson                   -6.206e-01  2.558e-02 -24.261  < 2e-16
## regionPittsburgh                      -9.812e-02  2.558e-02  -3.836 0.000126
## regionPlains                          -1.841e-01  2.560e-02  -7.192 6.66e-13
## regionPortland                        -3.023e-01  2.558e-02 -11.817  < 2e-16
## regionRaleighGreensboro               -1.217e-01  2.558e-02  -4.757 1.98e-06
## regionRichmondNorfolk                 -2.290e-01  2.558e-02  -8.952  < 2e-16
## regionRoanoke                         -2.528e-01  2.558e-02  -9.881  < 2e-16
## regionSacramento                      -7.492e-02  2.558e-02  -2.929 0.003407
## regionSanDiego                        -2.874e-01  2.558e-02 -11.233  < 2e-16
## regionSanFrancisco                     4.827e-02  2.558e-02   1.887 0.059175
## regionSeattle                         -1.790e-01  2.558e-02  -6.998 2.69e-12
## regionSouthCarolina                   -2.027e-01  2.558e-02  -7.923 2.44e-15
## regionSouthCentral                    -4.814e-01  2.568e-02 -18.742  < 2e-16
## regionSoutheast                       -1.877e-01  2.567e-02  -7.310 2.79e-13
## regionSpokane                         -2.328e-01  2.558e-02  -9.099  < 2e-16
## regionStLouis                         -1.632e-01  2.558e-02  -6.378 1.84e-10
## regionSyracuse                         3.817e-02  2.558e-02   1.492 0.135705
## regionTampa                           -1.473e-01  2.558e-02  -5.759 8.62e-09
## regionTotalUS                         -2.734e-01  3.186e-02  -8.583  < 2e-16
## regionWest                            -3.643e-01  2.559e-02 -14.235  < 2e-16
## regionWestTexNewMexico                -5.068e-01  2.558e-02 -19.813  < 2e-16
## quarter2                               8.101e-02  5.129e-03  15.793  < 2e-16
## quarter3                               2.186e-01  5.134e-03  42.587  < 2e-16
## quarter4                               1.620e-01  5.093e-03  31.820  < 2e-16
## year2016                              -3.735e-02  4.455e-03  -8.385  < 2e-16
## year2017                               1.383e-01  4.444e-03  31.110  < 2e-16
## year2018                               8.670e-02  7.937e-03  10.923  < 2e-16
## x_large_bags                           1.318e-07  1.499e-07   0.879 0.379416
## typeorganic:regionAtlanta              1.139e-01  3.618e-02   3.149 0.001642
## typeorganic:regionBaltimoreWashington -4.437e-02  3.618e-02  -1.226 0.220035
## typeorganic:regionBoise                1.196e-01  3.618e-02   3.307 0.000946
## typeorganic:regionBoston               2.849e-02  3.618e-02   0.788 0.430916
## typeorganic:regionBuffaloRochester    -1.555e-01  3.618e-02  -4.298 1.74e-05
## typeorganic:regionCalifornia           1.593e-01  3.647e-02   4.367 1.27e-05
## typeorganic:regionCharlotte            2.374e-01  3.618e-02   6.561 5.48e-11
## typeorganic:regionChicago             -4.944e-02  3.618e-02  -1.367 0.171744
## typeorganic:regionCincinnatiDayton    -3.699e-02  3.618e-02  -1.022 0.306593
## typeorganic:regionColumbus            -5.140e-02  3.618e-02  -1.421 0.155386
## typeorganic:regionDallasFtWorth        5.403e-02  3.618e-02   1.493 0.135327
## typeorganic:regionDenver              -1.353e-01  3.618e-02  -3.741 0.000184
## typeorganic:regionDetroit             -1.190e-01  3.620e-02  -3.288 0.001010
## typeorganic:regionGrandRapids         -6.400e-02  3.618e-02  -1.769 0.076968
## typeorganic:regionGreatLakes          -1.063e-01  3.661e-02  -2.903 0.003698
## typeorganic:regionHarrisburgScranton   8.447e-02  3.618e-02   2.335 0.019563
## typeorganic:regionHartfordSpringfield  3.967e-01  3.618e-02  10.965  < 2e-16
## typeorganic:regionHouston              2.134e-02  3.618e-02   0.590 0.555192
## typeorganic:regionIndianapolis        -8.609e-02  3.618e-02  -2.380 0.017343
## typeorganic:regionJacksonville         2.102e-01  3.618e-02   5.810 6.37e-09
## typeorganic:regionLasVegas             3.113e-01  3.618e-02   8.606  < 2e-16
## typeorganic:regionLosAngeles           5.770e-02  3.635e-02   1.587 0.112476
## typeorganic:regionLouisville          -6.178e-02  3.618e-02  -1.708 0.087678
## typeorganic:regionMiamiFtLauderdale   -7.601e-02  3.618e-02  -2.101 0.035652
## typeorganic:regionMidsouth            -2.831e-02  3.620e-02  -0.782 0.434169
## typeorganic:regionNashville           -2.610e-02  3.618e-02  -0.721 0.470616
## typeorganic:regionNewOrleansMobile     1.486e-02  3.618e-02   0.411 0.681207
## typeorganic:regionNewYork              2.266e-01  3.618e-02   6.263 3.86e-10
## typeorganic:regionNortheast            9.140e-02  3.619e-02   2.525 0.011567
## typeorganic:regionNorthernNewEngland   9.816e-03  3.618e-02   0.271 0.786139
## typeorganic:regionOrlando              1.591e-01  3.618e-02   4.399 1.09e-05
## typeorganic:regionPhiladelphia         4.709e-02  3.618e-02   1.302 0.193037
## typeorganic:regionPhoenixTucson        5.680e-01  3.618e-02  15.700  < 2e-16
## typeorganic:regionPittsburgh          -1.972e-01  3.618e-02  -5.451 5.06e-08
## typeorganic:regionPlains               1.183e-01  3.619e-02   3.269 0.001082
## typeorganic:regionPortland             1.179e-01  3.618e-02   3.259 0.001120
## typeorganic:regionRaleighGreensboro    2.315e-01  3.618e-02   6.400 1.59e-10
## typeorganic:regionRichmondNorfolk     -8.148e-02  3.618e-02  -2.252 0.024322
## typeorganic:regionRoanoke             -1.207e-01  3.618e-02  -3.338 0.000847
## typeorganic:regionSacramento           2.708e-01  3.618e-02   7.485 7.48e-14
## typeorganic:regionSanDiego             2.489e-01  3.618e-02   6.880 6.18e-12
## typeorganic:regionSanFrancisco         3.897e-01  3.618e-02  10.771  < 2e-16
## typeorganic:regionSeattle              1.211e-01  3.618e-02   3.347 0.000819
## typeorganic:regionSouthCarolina        8.973e-02  3.618e-02   2.480 0.013136
## typeorganic:regionSouthCentral         4.114e-02  3.625e-02   1.135 0.256458
## typeorganic:regionSoutheast            4.737e-02  3.624e-02   1.307 0.191198
## typeorganic:regionSpokane              2.346e-01  3.618e-02   6.486 9.03e-11
## typeorganic:regionStLouis              6.535e-02  3.618e-02   1.806 0.070875
## typeorganic:regionSyracuse            -1.578e-01  3.618e-02  -4.361 1.30e-05
## typeorganic:regionTampa               -9.910e-03  3.618e-02  -0.274 0.784145
## typeorganic:regionTotalUS              4.616e-02  4.086e-02   1.130 0.258597
## typeorganic:regionWest                 1.503e-01  3.618e-02   4.154 3.28e-05
## typeorganic:regionWestTexNewMexico     4.234e-01  3.626e-02  11.676  < 2e-16
##                                          
## (Intercept)                           ***
## typeorganic                           ***
## regionAtlanta                         ***
## regionBaltimoreWashington                
## regionBoise                           ***
## regionBoston                          .  
## regionBuffaloRochester                   
## regionCalifornia                      ***
## regionCharlotte                       ** 
## regionChicago                            
## regionCincinnatiDayton                ***
## regionColumbus                        ***
## regionDallasFtWorth                   ***
## regionDenver                          ***
## regionDetroit                         ***
## regionGrandRapids                        
## regionGreatLakes                      ***
## regionHarrisburgScranton              ***
## regionHartfordSpringfield             *  
## regionHouston                         ***
## regionIndianapolis                    ***
## regionJacksonville                    ***
## regionLasVegas                        ***
## regionLosAngeles                      ***
## regionLouisville                      ***
## regionMiamiFtLauderdale               ***
## regionMidsouth                        ***
## regionNashville                       ***
## regionNewOrleansMobile                ***
## regionNewYork                         *  
## regionNortheast                          
## regionNorthernNewEngland              ***
## regionOrlando                         ***
## regionPhiladelphia                    .  
## regionPhoenixTucson                   ***
## regionPittsburgh                      ***
## regionPlains                          ***
## regionPortland                        ***
## regionRaleighGreensboro               ***
## regionRichmondNorfolk                 ***
## regionRoanoke                         ***
## regionSacramento                      ** 
## regionSanDiego                        ***
## regionSanFrancisco                    .  
## regionSeattle                         ***
## regionSouthCarolina                   ***
## regionSouthCentral                    ***
## regionSoutheast                       ***
## regionSpokane                         ***
## regionStLouis                         ***
## regionSyracuse                           
## regionTampa                           ***
## regionTotalUS                         ***
## regionWest                            ***
## regionWestTexNewMexico                ***
## quarter2                              ***
## quarter3                              ***
## quarter4                              ***
## year2016                              ***
## year2017                              ***
## year2018                              ***
## x_large_bags                             
## typeorganic:regionAtlanta             ** 
## typeorganic:regionBaltimoreWashington    
## typeorganic:regionBoise               ***
## typeorganic:regionBoston                 
## typeorganic:regionBuffaloRochester    ***
## typeorganic:regionCalifornia          ***
## typeorganic:regionCharlotte           ***
## typeorganic:regionChicago                
## typeorganic:regionCincinnatiDayton       
## typeorganic:regionColumbus               
## typeorganic:regionDallasFtWorth          
## typeorganic:regionDenver              ***
## typeorganic:regionDetroit             ** 
## typeorganic:regionGrandRapids         .  
## typeorganic:regionGreatLakes          ** 
## typeorganic:regionHarrisburgScranton  *  
## typeorganic:regionHartfordSpringfield ***
## typeorganic:regionHouston                
## typeorganic:regionIndianapolis        *  
## typeorganic:regionJacksonville        ***
## typeorganic:regionLasVegas            ***
## typeorganic:regionLosAngeles             
## typeorganic:regionLouisville          .  
## typeorganic:regionMiamiFtLauderdale   *  
## typeorganic:regionMidsouth               
## typeorganic:regionNashville              
## typeorganic:regionNewOrleansMobile       
## typeorganic:regionNewYork             ***
## typeorganic:regionNortheast           *  
## typeorganic:regionNorthernNewEngland     
## typeorganic:regionOrlando             ***
## typeorganic:regionPhiladelphia           
## typeorganic:regionPhoenixTucson       ***
## typeorganic:regionPittsburgh          ***
## typeorganic:regionPlains              ** 
## typeorganic:regionPortland            ** 
## typeorganic:regionRaleighGreensboro   ***
## typeorganic:regionRichmondNorfolk     *  
## typeorganic:regionRoanoke             ***
## typeorganic:regionSacramento          ***
## typeorganic:regionSanDiego            ***
## typeorganic:regionSanFrancisco        ***
## typeorganic:regionSeattle             ***
## typeorganic:regionSouthCarolina       *  
## typeorganic:regionSouthCentral           
## typeorganic:regionSoutheast              
## typeorganic:regionSpokane             ***
## typeorganic:regionStLouis             .  
## typeorganic:regionSyracuse            ***
## typeorganic:regionTampa                  
## typeorganic:regionTotalUS                
## typeorganic:regionWest                ***
## typeorganic:regionWestTexNewMexico    ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2351 on 18134 degrees of freedom
## Multiple R-squared:  0.6611, Adjusted R-squared:  0.659 
## F-statistic: 310.3 on 114 and 18134 DF,  p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pb)

summary(model5pb)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + type:quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.02270 -0.14602 -0.00362  0.14398  1.44165 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.180e+00  1.454e-02  81.176  < 2e-16 ***
## typeorganic                4.717e-01  6.719e-03  70.203  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.907e-02 -11.713  < 2e-16 ***
## regionBaltimoreWashington -2.699e-02  1.907e-02  -1.416 0.156893    
## regionBoise               -2.129e-01  1.907e-02 -11.163  < 2e-16 ***
## regionBoston              -3.020e-02  1.907e-02  -1.584 0.113308    
## regionBuffaloRochester    -4.425e-02  1.907e-02  -2.320 0.020331 *  
## regionCalifornia          -1.718e-01  1.917e-02  -8.962  < 2e-16 ***
## regionCharlotte            4.497e-02  1.907e-02   2.358 0.018367 *  
## regionChicago             -4.649e-03  1.907e-02  -0.244 0.807387    
## regionCincinnatiDayton    -3.521e-01  1.907e-02 -18.465  < 2e-16 ***
## regionColumbus            -3.085e-01  1.907e-02 -16.177  < 2e-16 ***
## regionDallasFtWorth       -4.759e-01  1.907e-02 -24.957  < 2e-16 ***
## regionDenver              -3.425e-01  1.907e-02 -17.960  < 2e-16 ***
## regionDetroit             -2.868e-01  1.908e-02 -15.034  < 2e-16 ***
## regionGrandRapids         -5.696e-02  1.907e-02  -2.987 0.002824 ** 
## regionGreatLakes          -2.298e-01  1.921e-02 -11.964  < 2e-16 ***
## regionHarrisburgScranton  -4.788e-02  1.907e-02  -2.511 0.012048 *  
## regionHartfordSpringfield  2.576e-01  1.907e-02  13.508  < 2e-16 ***
## regionHouston             -5.134e-01  1.907e-02 -26.926  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.907e-02 -12.970  < 2e-16 ***
## regionJacksonville        -5.016e-02  1.907e-02  -2.631 0.008531 ** 
## regionLasVegas            -1.801e-01  1.907e-02  -9.444  < 2e-16 ***
## regionLosAngeles          -3.497e-01  1.913e-02 -18.284  < 2e-16 ***
## regionLouisville          -2.744e-01  1.907e-02 -14.392  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.907e-02  -6.967 3.35e-12 ***
## regionMidsouth            -1.578e-01  1.907e-02  -8.274  < 2e-16 ***
## regionNashville           -3.490e-01  1.907e-02 -18.303  < 2e-16 ***
## regionNewOrleansMobile    -2.568e-01  1.907e-02 -13.466  < 2e-16 ***
## regionNewYork              1.662e-01  1.907e-02   8.714  < 2e-16 ***
## regionNortheast            3.942e-02  1.907e-02   2.067 0.038772 *  
## regionNorthernNewEngland  -8.372e-02  1.907e-02  -4.390 1.14e-05 ***
## regionOrlando             -5.505e-02  1.907e-02  -2.887 0.003892 ** 
## regionPhiladelphia         7.102e-02  1.907e-02   3.725 0.000196 ***
## regionPhoenixTucson       -3.367e-01  1.907e-02 -17.659  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.907e-02 -10.317  < 2e-16 ***
## regionPlains              -1.258e-01  1.907e-02  -6.594 4.39e-11 ***
## regionPortland            -2.434e-01  1.907e-02 -12.762  < 2e-16 ***
## regionRaleighGreensboro   -5.977e-03  1.907e-02  -0.313 0.753941    
## regionRichmondNorfolk     -2.698e-01  1.907e-02 -14.149  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.907e-02 -16.423  < 2e-16 ***
## regionSacramento           6.034e-02  1.907e-02   3.164 0.001556 ** 
## regionSanDiego            -1.630e-01  1.907e-02  -8.548  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.907e-02  12.742  < 2e-16 ***
## regionSeattle             -1.185e-01  1.907e-02  -6.214 5.28e-10 ***
## regionSouthCarolina       -1.580e-01  1.907e-02  -8.284  < 2e-16 ***
## regionSouthCentral        -4.628e-01  1.909e-02 -24.240  < 2e-16 ***
## regionSoutheast           -1.659e-01  1.909e-02  -8.690  < 2e-16 ***
## regionSpokane             -1.154e-01  1.907e-02  -6.052 1.46e-09 ***
## regionStLouis             -1.306e-01  1.907e-02  -6.850 7.60e-12 ***
## regionSyracuse            -4.071e-02  1.907e-02  -2.135 0.032785 *  
## regionTampa               -1.524e-01  1.907e-02  -7.993 1.40e-15 ***
## regionTotalUS             -2.668e-01  2.064e-02 -12.928  < 2e-16 ***
## regionWest                -2.897e-01  1.907e-02 -15.193  < 2e-16 ***
## regionWestTexNewMexico    -2.969e-01  1.911e-02 -15.537  < 2e-16 ***
## quarter2                   6.536e-02  7.416e-03   8.814  < 2e-16 ***
## quarter3                   1.848e-01  7.423e-03  24.898  < 2e-16 ***
## quarter4                   1.530e-01  7.364e-03  20.776  < 2e-16 ***
## year2016                  -3.800e-02  4.689e-03  -8.102 5.72e-16 ***
## year2017                   1.374e-01  4.674e-03  29.392  < 2e-16 ***
## year2018                   8.529e-02  8.351e-03  10.213  < 2e-16 ***
## x_large_bags               3.916e-07  1.246e-07   3.142 0.001682 ** 
## typeorganic:quarter2       3.034e-02  1.015e-02   2.989 0.002800 ** 
## typeorganic:quarter3       6.653e-02  1.015e-02   6.553 5.80e-11 ***
## typeorganic:quarter4       1.817e-02  1.008e-02   1.803 0.071446 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2479 on 18184 degrees of freedom
## Multiple R-squared:  0.6224, Adjusted R-squared:  0.621 
## F-statistic: 468.3 on 64 and 18184 DF,  p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pc)

summary(model5pc)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + type:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00898 -0.14443 -0.00472  0.13873  1.46680 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.118e+00  1.442e-02  77.501  < 2e-16 ***
## typeorganic                5.956e-01  6.569e-03  90.667  < 2e-16 ***
## regionAtlanta             -2.232e-01  1.892e-02 -11.796  < 2e-16 ***
## regionBaltimoreWashington -2.687e-02  1.892e-02  -1.420 0.155567    
## regionBoise               -2.129e-01  1.892e-02 -11.252  < 2e-16 ***
## regionBoston              -3.016e-02  1.892e-02  -1.594 0.110873    
## regionBuffaloRochester    -4.422e-02  1.892e-02  -2.337 0.019445 *  
## regionCalifornia          -1.678e-01  1.902e-02  -8.823  < 2e-16 ***
## regionCharlotte            4.499e-02  1.892e-02   2.378 0.017419 *  
## regionChicago             -4.393e-03  1.892e-02  -0.232 0.816388    
## regionCincinnatiDayton    -3.519e-01  1.892e-02 -18.601  < 2e-16 ***
## regionColumbus            -3.083e-01  1.892e-02 -16.297  < 2e-16 ***
## regionDallasFtWorth       -4.756e-01  1.892e-02 -25.137  < 2e-16 ***
## regionDenver              -3.425e-01  1.892e-02 -18.101  < 2e-16 ***
## regionDetroit             -2.856e-01  1.893e-02 -15.087  < 2e-16 ***
## regionGrandRapids         -5.635e-02  1.892e-02  -2.978 0.002904 ** 
## regionGreatLakes          -2.250e-01  1.906e-02 -11.803  < 2e-16 ***
## regionHarrisburgScranton  -4.780e-02  1.892e-02  -2.526 0.011537 *  
## regionHartfordSpringfield  2.576e-01  1.892e-02  13.615  < 2e-16 ***
## regionHouston             -5.132e-01  1.892e-02 -27.126  < 2e-16 ***
## regionIndianapolis        -2.471e-01  1.892e-02 -13.062  < 2e-16 ***
## regionJacksonville        -5.011e-02  1.892e-02  -2.649 0.008085 ** 
## regionLasVegas            -1.801e-01  1.892e-02  -9.520  < 2e-16 ***
## regionLosAngeles          -3.466e-01  1.898e-02 -18.265  < 2e-16 ***
## regionLouisville          -2.744e-01  1.892e-02 -14.502  < 2e-16 ***
## regionMiamiFtLauderdale   -1.326e-01  1.892e-02  -7.011 2.45e-12 ***
## regionMidsouth            -1.568e-01  1.893e-02  -8.285  < 2e-16 ***
## regionNashville           -3.490e-01  1.892e-02 -18.445  < 2e-16 ***
## regionNewOrleansMobile    -2.564e-01  1.892e-02 -13.553  < 2e-16 ***
## regionNewYork              1.664e-01  1.892e-02   8.796  < 2e-16 ***
## regionNortheast            4.039e-02  1.893e-02   2.134 0.032855 *  
## regionNorthernNewEngland  -8.367e-02  1.892e-02  -4.422 9.83e-06 ***
## regionOrlando             -5.490e-02  1.892e-02  -2.902 0.003714 ** 
## regionPhiladelphia         7.107e-02  1.892e-02   3.756 0.000173 ***
## regionPhoenixTucson       -3.366e-01  1.892e-02 -17.793  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.892e-02 -10.398  < 2e-16 ***
## regionPlains              -1.249e-01  1.892e-02  -6.603 4.14e-11 ***
## regionPortland            -2.433e-01  1.892e-02 -12.861  < 2e-16 ***
## regionRaleighGreensboro   -5.938e-03  1.892e-02  -0.314 0.753649    
## regionRichmondNorfolk     -2.697e-01  1.892e-02 -14.257  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.892e-02 -16.550  < 2e-16 ***
## regionSacramento           6.047e-02  1.892e-02   3.196 0.001396 ** 
## regionSanDiego            -1.629e-01  1.892e-02  -8.611  < 2e-16 ***
## regionSanFrancisco         2.431e-01  1.892e-02  12.849  < 2e-16 ***
## regionSeattle             -1.185e-01  1.892e-02  -6.262 3.89e-10 ***
## regionSouthCarolina       -1.578e-01  1.892e-02  -8.342  < 2e-16 ***
## regionSouthCentral        -4.608e-01  1.894e-02 -24.326  < 2e-16 ***
## regionSoutheast           -1.640e-01  1.894e-02  -8.658  < 2e-16 ***
## regionSpokane             -1.154e-01  1.892e-02  -6.101 1.07e-09 ***
## regionStLouis             -1.305e-01  1.892e-02  -6.897 5.49e-12 ***
## regionSyracuse            -4.071e-02  1.892e-02  -2.152 0.031432 *  
## regionTampa               -1.523e-01  1.892e-02  -8.048 8.93e-16 ***
## regionTotalUS             -2.505e-01  2.049e-02 -12.226  < 2e-16 ***
## regionWest                -2.891e-01  1.892e-02 -15.280  < 2e-16 ***
## regionWestTexNewMexico    -2.967e-01  1.896e-02 -15.650  < 2e-16 ***
## quarter2                   8.091e-02  5.363e-03  15.085  < 2e-16 ***
## quarter3                   2.186e-01  5.366e-03  40.744  < 2e-16 ***
## quarter4                   1.620e-01  5.327e-03  30.417  < 2e-16 ***
## year2016                   2.694e-02  6.596e-03   4.084 4.45e-05 ***
## year2017                   2.152e-01  6.582e-03  32.691  < 2e-16 ***
## year2018                   1.641e-01  1.128e-02  14.549  < 2e-16 ***
## x_large_bags               1.338e-07  1.241e-07   1.078 0.281087    
## typeorganic:year2016      -1.285e-01  9.306e-03 -13.813  < 2e-16 ***
## typeorganic:year2017      -1.540e-01  9.275e-03 -16.600  < 2e-16 ***
## typeorganic:year2018      -1.548e-01  1.520e-02 -10.184  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.246 on 18184 degrees of freedom
## Multiple R-squared:  0.6282, Adjusted R-squared:  0.6269 
## F-statistic: 480.1 on 64 and 18184 DF,  p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pd)

summary(model5pd)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + type:x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03574 -0.14591 -0.00478  0.14434  1.43935 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.168e+00  1.429e-02  81.734  < 2e-16 ***
## typeorganic                4.978e-01  3.757e-03 132.483  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.909e-02 -11.701  < 2e-16 ***
## regionBaltimoreWashington -2.699e-02  1.909e-02  -1.414 0.157339    
## regionBoise               -2.130e-01  1.909e-02 -11.159  < 2e-16 ***
## regionBoston              -3.020e-02  1.909e-02  -1.582 0.113671    
## regionBuffaloRochester    -4.425e-02  1.909e-02  -2.318 0.020456 *  
## regionCalifornia          -1.717e-01  1.918e-02  -8.949  < 2e-16 ***
## regionCharlotte            4.497e-02  1.909e-02   2.356 0.018481 *  
## regionChicago             -4.644e-03  1.909e-02  -0.243 0.807777    
## regionCincinnatiDayton    -3.521e-01  1.909e-02 -18.446  < 2e-16 ***
## regionColumbus            -3.085e-01  1.909e-02 -16.160  < 2e-16 ***
## regionDallasFtWorth       -4.759e-01  1.909e-02 -24.932  < 2e-16 ***
## regionDenver              -3.425e-01  1.909e-02 -17.943  < 2e-16 ***
## regionDetroit             -2.868e-01  1.910e-02 -15.017  < 2e-16 ***
## regionGrandRapids         -5.695e-02  1.909e-02  -2.983 0.002857 ** 
## regionGreatLakes          -2.297e-01  1.923e-02 -11.947  < 2e-16 ***
## regionHarrisburgScranton  -4.788e-02  1.909e-02  -2.508 0.012135 *  
## regionHartfordSpringfield  2.576e-01  1.909e-02  13.494  < 2e-16 ***
## regionHouston             -5.134e-01  1.909e-02 -26.899  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.909e-02 -12.957  < 2e-16 ***
## regionJacksonville        -5.016e-02  1.909e-02  -2.628 0.008598 ** 
## regionLasVegas            -1.801e-01  1.909e-02  -9.435  < 2e-16 ***
## regionLosAngeles          -3.496e-01  1.915e-02 -18.263  < 2e-16 ***
## regionLouisville          -2.744e-01  1.909e-02 -14.377  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.909e-02  -6.960 3.52e-12 ***
## regionMidsouth            -1.578e-01  1.909e-02  -8.265  < 2e-16 ***
## regionNashville           -3.490e-01  1.909e-02 -18.285  < 2e-16 ***
## regionNewOrleansMobile    -2.568e-01  1.909e-02 -13.453  < 2e-16 ***
## regionNewYork              1.662e-01  1.909e-02   8.706  < 2e-16 ***
## regionNortheast            3.944e-02  1.909e-02   2.066 0.038871 *  
## regionNorthernNewEngland  -8.372e-02  1.909e-02  -4.386 1.16e-05 ***
## regionOrlando             -5.505e-02  1.909e-02  -2.884 0.003929 ** 
## regionPhiladelphia         7.102e-02  1.909e-02   3.721 0.000199 ***
## regionPhoenixTucson       -3.367e-01  1.909e-02 -17.642  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.909e-02 -10.307  < 2e-16 ***
## regionPlains              -1.258e-01  1.909e-02  -6.589 4.56e-11 ***
## regionPortland            -2.447e-01  1.909e-02 -12.817  < 2e-16 ***
## regionRaleighGreensboro   -5.976e-03  1.909e-02  -0.313 0.754207    
## regionRichmondNorfolk     -2.698e-01  1.909e-02 -14.135  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.909e-02 -16.406  < 2e-16 ***
## regionSacramento           6.034e-02  1.909e-02   3.161 0.001572 ** 
## regionSanDiego            -1.630e-01  1.909e-02  -8.539  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.909e-02  12.730  < 2e-16 ***
## regionSeattle             -1.212e-01  1.912e-02  -6.341 2.34e-10 ***
## regionSouthCarolina       -1.580e-01  1.909e-02  -8.276  < 2e-16 ***
## regionSouthCentral        -4.628e-01  1.911e-02 -24.214  < 2e-16 ***
## regionSoutheast           -1.658e-01  1.911e-02  -8.679  < 2e-16 ***
## regionSpokane             -1.156e-01  1.909e-02  -6.056 1.42e-09 ***
## regionStLouis             -1.306e-01  1.909e-02  -6.843 7.98e-12 ***
## regionSyracuse            -4.071e-02  1.909e-02  -2.133 0.032957 *  
## regionTampa               -1.524e-01  1.909e-02  -7.985 1.49e-15 ***
## regionTotalUS             -2.719e-01  2.084e-02 -13.048  < 2e-16 ***
## regionWest                -2.951e-01  1.920e-02 -15.366  < 2e-16 ***
## regionWestTexNewMexico    -2.970e-01  1.913e-02 -15.524  < 2e-16 ***
## quarter2                   8.054e-02  5.411e-03  14.885  < 2e-16 ***
## quarter3                   2.180e-01  5.414e-03  40.259  < 2e-16 ***
## quarter4                   1.616e-01  5.377e-03  30.058  < 2e-16 ***
## year2016                  -3.798e-02  4.694e-03  -8.092 6.25e-16 ***
## year2017                   1.370e-01  4.684e-03  29.241  < 2e-16 ***
## year2018                   8.319e-02  8.405e-03   9.898  < 2e-16 ***
## x_large_bags               3.865e-07  1.250e-07   3.091 0.001995 ** 
## typeorganic:x_large_bags   4.737e-04  1.827e-04   2.593 0.009522 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2481 on 18186 degrees of freedom
## Multiple R-squared:  0.6216, Adjusted R-squared:  0.6203 
## F-statistic: 481.8 on 62 and 18186 DF,  p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pe)

summary(model5pe)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + region:quarter, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.06468 -0.14582  0.00048  0.14087  1.38018 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.216e+00  2.423e-02  50.190  < 2e-16 ***
## typeorganic                         4.985e-01  3.663e-03 136.095  < 2e-16 ***
## regionAtlanta                      -2.579e-01  3.388e-02  -7.611 2.85e-14 ***
## regionBaltimoreWashington          -8.986e-02  3.388e-02  -2.652 0.008000 ** 
## regionBoise                        -2.854e-01  3.388e-02  -8.424  < 2e-16 ***
## regionBoston                       -7.093e-03  3.388e-02  -0.209 0.834158    
## regionBuffaloRochester             -3.109e-02  3.388e-02  -0.918 0.358774    
## regionCalifornia                   -2.868e-01  3.394e-02  -8.450  < 2e-16 ***
## regionCharlotte                    -2.147e-02  3.388e-02  -0.634 0.526347    
## regionChicago                      -7.400e-02  3.388e-02  -2.184 0.028945 *  
## regionCincinnatiDayton             -4.353e-01  3.388e-02 -12.849  < 2e-16 ***
## regionColumbus                     -3.251e-01  3.388e-02  -9.595  < 2e-16 ***
## regionDallasFtWorth                -4.853e-01  3.388e-02 -14.325  < 2e-16 ***
## regionDenver                       -4.216e-01  3.388e-02 -12.443  < 2e-16 ***
## regionDetroit                      -3.074e-01  3.389e-02  -9.071  < 2e-16 ***
## regionGrandRapids                  -1.295e-01  3.388e-02  -3.824 0.000132 ***
## regionGreatLakes                   -2.769e-01  3.398e-02  -8.150 3.88e-16 ***
## regionHarrisburgScranton           -6.005e-02  3.388e-02  -1.773 0.076319 .  
## regionHartfordSpringfield           2.290e-01  3.388e-02   6.759 1.43e-11 ***
## regionHouston                      -5.375e-01  3.388e-02 -15.867  < 2e-16 ***
## regionIndianapolis                 -2.742e-01  3.388e-02  -8.093 6.20e-16 ***
## regionJacksonville                 -1.104e-01  3.388e-02  -3.259 0.001121 ** 
## regionLasVegas                     -2.907e-01  3.388e-02  -8.581  < 2e-16 ***
## regionLosAngeles                   -4.383e-01  3.391e-02 -12.923  < 2e-16 ***
## regionLouisville                   -2.956e-01  3.388e-02  -8.725  < 2e-16 ***
## regionMiamiFtLauderdale            -1.119e-01  3.388e-02  -3.302 0.000962 ***
## regionMidsouth                     -1.953e-01  3.388e-02  -5.764 8.33e-09 ***
## regionNashville                    -3.514e-01  3.388e-02 -10.372  < 2e-16 ***
## regionNewOrleansMobile             -3.177e-01  3.388e-02  -9.377  < 2e-16 ***
## regionNewYork                       1.048e-01  3.388e-02   3.094 0.001979 ** 
## regionNortheast                     1.933e-02  3.388e-02   0.570 0.568361    
## regionNorthernNewEngland           -5.982e-02  3.388e-02  -1.766 0.077455 .  
## regionOrlando                      -1.034e-01  3.388e-02  -3.053 0.002269 ** 
## regionPhiladelphia                  1.650e-02  3.388e-02   0.487 0.626253    
## regionPhoenixTucson                -4.456e-01  3.388e-02 -13.153  < 2e-16 ***
## regionPittsburgh                   -1.745e-01  3.388e-02  -5.151 2.62e-07 ***
## regionPlains                       -1.852e-01  3.388e-02  -5.466 4.67e-08 ***
## regionPortland                     -3.533e-01  3.388e-02 -10.429  < 2e-16 ***
## regionRaleighGreensboro            -5.803e-02  3.388e-02  -1.713 0.086751 .  
## regionRichmondNorfolk              -2.636e-01  3.388e-02  -7.782 7.52e-15 ***
## regionRoanoke                      -3.123e-01  3.388e-02  -9.217  < 2e-16 ***
## regionSacramento                   -2.741e-02  3.388e-02  -0.809 0.418472    
## regionSanDiego                     -2.868e-01  3.388e-02  -8.466  < 2e-16 ***
## regionSanFrancisco                  9.026e-02  3.388e-02   2.664 0.007726 ** 
## regionSeattle                      -2.589e-01  3.388e-02  -7.642 2.25e-14 ***
## regionSouthCarolina                -2.071e-01  3.388e-02  -6.114 9.93e-10 ***
## regionSouthCentral                 -4.798e-01  3.390e-02 -14.153  < 2e-16 ***
## regionSoutheast                    -2.084e-01  3.388e-02  -6.151 7.88e-10 ***
## regionSpokane                      -2.696e-01  3.388e-02  -7.958 1.85e-15 ***
## regionStLouis                      -1.910e-01  3.388e-02  -5.639 1.74e-08 ***
## regionSyracuse                     -2.764e-02  3.388e-02  -0.816 0.414661    
## regionTampa                        -1.532e-01  3.388e-02  -4.523 6.14e-06 ***
## regionTotalUS                      -3.151e-01  3.466e-02  -9.091  < 2e-16 ***
## regionWest                         -3.903e-01  3.388e-02 -11.520  < 2e-16 ***
## regionWestTexNewMexico             -3.665e-01  3.388e-02 -10.818  < 2e-16 ***
## quarter2                            8.528e-02  3.644e-02   2.341 0.019266 *  
## quarter3                            9.278e-02  3.644e-02   2.546 0.010895 *  
## quarter4                            7.165e-02  3.618e-02   1.981 0.047660 *  
## year2016                           -3.808e-02  4.577e-03  -8.319  < 2e-16 ***
## year2017                            1.373e-01  4.563e-03  30.081  < 2e-16 ***
## year2018                            8.513e-02  8.151e-03  10.444  < 2e-16 ***
## x_large_bags                        4.158e-07  1.233e-07   3.373 0.000746 ***
## regionAtlanta:quarter2             -8.875e-02  5.147e-02  -1.725 0.084627 .  
## regionBaltimoreWashington:quarter2  9.216e-02  5.147e-02   1.791 0.073359 .  
## regionBoise:quarter2               -9.544e-02  5.147e-02  -1.854 0.063692 .  
## regionBoston:quarter2               1.139e-02  5.147e-02   0.221 0.824911    
## regionBuffaloRochester:quarter2     8.166e-02  5.147e-02   1.587 0.112579    
## regionCalifornia:quarter2           4.240e-03  5.147e-02   0.082 0.934345    
## regionCharlotte:quarter2            6.218e-02  5.147e-02   1.208 0.226952    
## regionChicago:quarter2             -4.249e-03  5.147e-02  -0.083 0.934198    
## regionCincinnatiDayton:quarter2     1.014e-02  5.147e-02   0.197 0.843877    
## regionColumbus:quarter2            -9.402e-02  5.147e-02  -1.827 0.067727 .  
## regionDallasFtWorth:quarter2       -7.789e-02  5.147e-02  -1.513 0.130177    
## regionDenver:quarter2              -1.578e-02  5.147e-02  -0.307 0.759141    
## regionDetroit:quarter2             -3.691e-02  5.147e-02  -0.717 0.473257    
## regionGrandRapids:quarter2          1.363e-01  5.147e-02   2.649 0.008086 ** 
## regionGreatLakes:quarter2          -1.091e-02  5.147e-02  -0.212 0.832191    
## regionHarrisburgScranton:quarter2   6.543e-02  5.147e-02   1.271 0.203625    
## regionHartfordSpringfield:quarter2  6.725e-02  5.147e-02   1.307 0.191332    
## regionHouston:quarter2             -8.920e-02  5.147e-02  -1.733 0.083088 .  
## regionIndianapolis:quarter2        -6.425e-02  5.147e-02  -1.248 0.211928    
## regionJacksonville:quarter2         2.811e-02  5.147e-02   0.546 0.584928    
## regionLasVegas:quarter2            -7.424e-02  5.147e-02  -1.443 0.149173    
## regionLosAngeles:quarter2          -6.060e-02  5.147e-02  -1.177 0.239049    
## regionLouisville:quarter2          -7.449e-02  5.147e-02  -1.447 0.147834    
## regionMiamiFtLauderdale:quarter2   -1.020e-02  5.147e-02  -0.198 0.842828    
## regionMidsouth:quarter2            -1.515e-02  5.147e-02  -0.294 0.768501    
## regionNashville:quarter2           -1.026e-01  5.147e-02  -1.993 0.046304 *  
## regionNewOrleansMobile:quarter2     8.341e-02  5.147e-02   1.621 0.105105    
## regionNewYork:quarter2              8.732e-02  5.147e-02   1.697 0.089772 .  
## regionNortheast:quarter2            5.500e-02  5.147e-02   1.069 0.285265    
## regionNorthernNewEngland:quarter2  -6.770e-02  5.147e-02  -1.316 0.188354    
## regionOrlando:quarter2              1.769e-02  5.147e-02   0.344 0.731089    
## regionPhiladelphia:quarter2         1.100e-01  5.147e-02   2.137 0.032587 *  
## regionPhoenixTucson:quarter2       -1.980e-02  5.147e-02  -0.385 0.700459    
## regionPittsburgh:quarter2          -3.807e-02  5.147e-02  -0.740 0.459513    
## regionPlains:quarter2              -4.009e-03  5.147e-02  -0.078 0.937911    
## regionPortland:quarter2            -4.527e-02  5.147e-02  -0.880 0.379084    
## regionRaleighGreensboro:quarter2    1.832e-03  5.147e-02   0.036 0.971604    
## regionRichmondNorfolk:quarter2     -1.137e-01  5.147e-02  -2.209 0.027195 *  
## regionRoanoke:quarter2             -1.312e-01  5.147e-02  -2.550 0.010779 *  
## regionSacramento:quarter2           8.446e-02  5.147e-02   1.641 0.100786    
## regionSanDiego:quarter2            -3.285e-03  5.147e-02  -0.064 0.949106    
## regionSanFrancisco:quarter2         1.221e-01  5.147e-02   2.373 0.017637 *  
## regionSeattle:quarter2              1.210e-02  5.147e-02   0.235 0.814101    
## regionSouthCarolina:quarter2        2.735e-02  5.147e-02   0.531 0.595172    
## regionSouthCentral:quarter2        -7.164e-02  5.147e-02  -1.392 0.163922    
## regionSoutheast:quarter2           -9.837e-03  5.148e-02  -0.191 0.848456    
## regionSpokane:quarter2              9.803e-03  5.147e-02   0.190 0.848939    
## regionStLouis:quarter2              5.672e-02  5.147e-02   1.102 0.270444    
## regionSyracuse:quarter2             6.494e-02  5.147e-02   1.262 0.207015    
## regionTampa:quarter2                5.706e-03  5.147e-02   0.111 0.911722    
## regionTotalUS:quarter2             -1.476e-02  5.149e-02  -0.287 0.774329    
## regionWest:quarter2                -2.856e-02  5.147e-02  -0.555 0.578953    
## regionWestTexNewMexico:quarter2    -9.603e-02  5.166e-02  -1.859 0.063053 .  
## regionAtlanta:quarter3              1.224e-01  5.147e-02   2.378 0.017422 *  
## regionBaltimoreWashington:quarter3  9.538e-02  5.147e-02   1.853 0.063854 .  
## regionBoise:quarter3                2.521e-01  5.147e-02   4.898 9.79e-07 ***
## regionBoston:quarter3              -1.212e-03  5.147e-02  -0.024 0.981214    
## regionBuffaloRochester:quarter3    -3.416e-02  5.147e-02  -0.664 0.506909    
## regionCalifornia:quarter3           2.572e-01  5.147e-02   4.996 5.89e-07 ***
## regionCharlotte:quarter3            1.397e-01  5.147e-02   2.715 0.006641 ** 
## regionChicago:quarter3              1.740e-01  5.147e-02   3.381 0.000723 ***
## regionCincinnatiDayton:quarter3     2.128e-01  5.147e-02   4.135 3.57e-05 ***
## regionColumbus:quarter3             1.094e-01  5.147e-02   2.126 0.033525 *  
## regionDallasFtWorth:quarter3        2.363e-02  5.147e-02   0.459 0.646184    
## regionDenver:quarter3               2.124e-01  5.147e-02   4.128 3.68e-05 ***
## regionDetroit:quarter3              5.517e-02  5.147e-02   1.072 0.283742    
## regionGrandRapids:quarter3          9.166e-02  5.147e-02   1.781 0.074936 .  
## regionGreatLakes:quarter3           1.228e-01  5.147e-02   2.387 0.017003 *  
## regionHarrisburgScranton:quarter3   6.457e-03  5.147e-02   0.125 0.900153    
## regionHartfordSpringfield:quarter3  4.942e-02  5.147e-02   0.960 0.336930    
## regionHouston:quarter3              7.247e-02  5.147e-02   1.408 0.159093    
## regionIndianapolis:quarter3         9.223e-02  5.147e-02   1.792 0.073138 .  
## regionJacksonville:quarter3         1.680e-01  5.147e-02   3.265 0.001098 ** 
## regionLasVegas:quarter3             2.954e-01  5.147e-02   5.740 9.61e-09 ***
## regionLosAngeles:quarter3           2.150e-01  5.147e-02   4.178 2.96e-05 ***
## regionLouisville:quarter3           8.478e-02  5.147e-02   1.647 0.099505 .  
## regionMiamiFtLauderdale:quarter3   -7.307e-02  5.147e-02  -1.420 0.155672    
## regionMidsouth:quarter3             9.249e-02  5.147e-02   1.797 0.072360 .  
## regionNashville:quarter3            4.167e-02  5.147e-02   0.810 0.418085    
## regionNewOrleansMobile:quarter3     7.109e-02  5.147e-02   1.381 0.167222    
## regionNewYork:quarter3              1.121e-01  5.147e-02   2.177 0.029476 *  
## regionNortheast:quarter3            4.725e-02  5.147e-02   0.918 0.358649    
## regionNorthernNewEngland:quarter3  -1.389e-02  5.147e-02  -0.270 0.787273    
## regionOrlando:quarter3              1.156e-01  5.147e-02   2.245 0.024762 *  
## regionPhiladelphia:quarter3         8.202e-02  5.147e-02   1.594 0.111012    
## regionPhoenixTucson:quarter3        2.603e-01  5.147e-02   5.058 4.27e-07 ***
## regionPittsburgh:quarter3          -1.622e-02  5.147e-02  -0.315 0.752619    
## regionPlains:quarter3               1.348e-01  5.147e-02   2.619 0.008837 ** 
## regionPortland:quarter3             3.344e-01  5.147e-02   6.498 8.33e-11 ***
## regionRaleighGreensboro:quarter3    1.211e-01  5.147e-02   2.354 0.018600 *  
## regionRichmondNorfolk:quarter3      5.134e-02  5.147e-02   0.998 0.318528    
## regionRoanoke:quarter3              9.037e-02  5.147e-02   1.756 0.079127 .  
## regionSacramento:quarter3           1.815e-01  5.147e-02   3.527 0.000421 ***
## regionSanDiego:quarter3             2.805e-01  5.147e-02   5.451 5.08e-08 ***
## regionSanFrancisco:quarter3         3.126e-01  5.147e-02   6.074 1.27e-09 ***
## regionSeattle:quarter3              3.922e-01  5.147e-02   7.620 2.66e-14 ***
## regionSouthCarolina:quarter3        1.023e-01  5.147e-02   1.987 0.046905 *  
## regionSouthCentral:quarter3         4.390e-02  5.147e-02   0.853 0.393732    
## regionSoutheast:quarter3            1.067e-01  5.148e-02   2.073 0.038179 *  
## regionSpokane:quarter3              3.937e-01  5.147e-02   7.650 2.11e-14 ***
## regionStLouis:quarter3              1.916e-01  5.147e-02   3.723 0.000197 ***
## regionSyracuse:quarter3            -3.686e-02  5.147e-02  -0.716 0.473930    
## regionTampa:quarter3               -4.372e-02  5.147e-02  -0.850 0.395566    
## regionTotalUS:quarter3              9.405e-02  5.156e-02   1.824 0.068183 .  
## regionWest:quarter3                 2.980e-01  5.147e-02   5.791 7.13e-09 ***
## regionWestTexNewMexico:quarter3     1.785e-01  5.147e-02   3.469 0.000523 ***
## regionAtlanta:quarter4              1.130e-01  5.110e-02   2.210 0.027086 *  
## regionBaltimoreWashington:quarter4  8.270e-02  5.110e-02   1.618 0.105581    
## regionBoise:quarter4                1.538e-01  5.110e-02   3.009 0.002626 ** 
## regionBoston:quarter4              -1.075e-01  5.110e-02  -2.105 0.035345 *  
## regionBuffaloRochester:quarter4    -1.019e-01  5.110e-02  -1.994 0.046129 *  
## regionCalifornia:quarter4           2.298e-01  5.110e-02   4.496 6.96e-06 ***
## regionCharlotte:quarter4            8.383e-02  5.110e-02   1.641 0.100897    
## regionChicago:quarter4              1.274e-01  5.110e-02   2.493 0.012665 *  
## regionCincinnatiDayton:quarter4     1.341e-01  5.110e-02   2.625 0.008682 ** 
## regionColumbus:quarter4             5.517e-02  5.110e-02   1.080 0.280283    
## regionDallasFtWorth:quarter4        9.257e-02  5.110e-02   1.811 0.070085 .  
## regionDenver:quarter4               1.424e-01  5.110e-02   2.787 0.005319 ** 
## regionDetroit:quarter4              6.867e-02  5.110e-02   1.344 0.179058    
## regionGrandRapids:quarter4          8.416e-02  5.110e-02   1.647 0.099577 .  
## regionGreatLakes:quarter4           8.786e-02  5.111e-02   1.719 0.085637 .  
## regionHarrisburgScranton:quarter4  -1.870e-02  5.110e-02  -0.366 0.714421    
## regionHartfordSpringfield:quarter4  7.018e-03  5.110e-02   0.137 0.890763    
## regionHouston:quarter4              1.181e-01  5.110e-02   2.311 0.020852 *  
## regionIndianapolis:quarter4         8.610e-02  5.110e-02   1.685 0.092029 .  
## regionJacksonville:quarter4         6.326e-02  5.110e-02   1.238 0.215724    
## regionLasVegas:quarter4             2.517e-01  5.110e-02   4.925 8.50e-07 ***
## regionLosAngeles:quarter4           2.225e-01  5.110e-02   4.354 1.34e-05 ***
## regionLouisville:quarter4           7.942e-02  5.110e-02   1.554 0.120131    
## regionMiamiFtLauderdale:quarter4   -7.519e-03  5.110e-02  -0.147 0.883012    
## regionMidsouth:quarter4             8.252e-02  5.110e-02   1.615 0.106367    
## regionNashville:quarter4            6.942e-02  5.110e-02   1.358 0.174327    
## regionNewOrleansMobile:quarter4     1.065e-01  5.110e-02   2.085 0.037083 *  
## regionNewYork:quarter4              6.475e-02  5.110e-02   1.267 0.205122    
## regionNortheast:quarter4           -1.518e-02  5.110e-02  -0.297 0.766432    
## regionNorthernNewEngland:quarter4  -2.143e-02  5.110e-02  -0.419 0.674965    
## regionOrlando:quarter4              7.442e-02  5.110e-02   1.456 0.145298    
## regionPhiladelphia:quarter4         4.312e-02  5.110e-02   0.844 0.398758    
## regionPhoenixTucson:quarter4        2.254e-01  5.110e-02   4.412 1.03e-05 ***
## regionPittsburgh:quarter4          -4.100e-02  5.110e-02  -0.802 0.422337    
## regionPlains:quarter4               1.232e-01  5.110e-02   2.411 0.015911 *  
## regionPortland:quarter4             1.827e-01  5.110e-02   3.575 0.000351 ***
## regionRaleighGreensboro:quarter4    1.000e-01  5.110e-02   1.957 0.050323 .  
## regionRichmondNorfolk:quarter4      3.477e-02  5.110e-02   0.680 0.496243    
## regionRoanoke:quarter4              3.614e-02  5.110e-02   0.707 0.479434    
## regionSacramento:quarter4           1.114e-01  5.110e-02   2.179 0.029314 *  
## regionSanDiego:quarter4             2.529e-01  5.110e-02   4.949 7.54e-07 ***
## regionSanFrancisco:quarter4         2.213e-01  5.110e-02   4.331 1.49e-05 ***
## regionSeattle:quarter4              1.990e-01  5.110e-02   3.895 9.85e-05 ***
## regionSouthCarolina:quarter4        8.128e-02  5.110e-02   1.591 0.111701    
## regionSouthCentral:quarter4         9.805e-02  5.110e-02   1.919 0.055046 .  
## regionSoutheast:quarter4            8.435e-02  5.110e-02   1.651 0.098813 .  
## regionSpokane:quarter4              2.581e-01  5.110e-02   5.051 4.44e-07 ***
## regionStLouis:quarter4              1.303e-02  5.110e-02   0.255 0.798777    
## regionSyracuse:quarter4            -8.261e-02  5.110e-02  -1.617 0.105954    
## regionTampa:quarter4                4.048e-02  5.110e-02   0.792 0.428229    
## regionTotalUS:quarter4              1.204e-01  5.117e-02   2.352 0.018680 *  
## regionWest:quarter4                 1.619e-01  5.110e-02   3.169 0.001534 ** 
## regionWestTexNewMexico:quarter4     2.119e-01  5.119e-02   4.140 3.49e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2419 on 18028 degrees of freedom
## Multiple R-squared:  0.6434, Adjusted R-squared:  0.639 
## F-statistic: 147.8 on 220 and 18028 DF,  p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pf)

summary(model5pf)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + region:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03187 -0.14124 -0.00167  0.13786  1.38842 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.175e+00  2.396e-02  49.024  < 2e-16 ***
## typeorganic                         4.975e-01  3.662e-03 135.876  < 2e-16 ***
## regionAtlanta                      -1.582e-01  3.348e-02  -4.724 2.33e-06 ***
## regionBaltimoreWashington          -1.699e-01  3.348e-02  -5.075 3.92e-07 ***
## regionBoise                        -1.650e-01  3.348e-02  -4.928 8.38e-07 ***
## regionBoston                       -6.519e-02  3.348e-02  -1.947 0.051546 .  
## regionBuffaloRochester              5.865e-03  3.348e-02   0.175 0.860943    
## regionCalifornia                   -2.234e-01  3.348e-02  -6.671 2.61e-11 ***
## regionCharlotte                     3.702e-02  3.348e-02   1.106 0.268906    
## regionChicago                      -1.348e-01  3.348e-02  -4.026 5.69e-05 ***
## regionCincinnatiDayton             -3.367e-01  3.348e-02 -10.056  < 2e-16 ***
## regionColumbus                     -2.651e-01  3.348e-02  -7.919 2.54e-15 ***
## regionDallasFtWorth                -4.610e-01  3.348e-02 -13.768  < 2e-16 ***
## regionDenver                       -3.510e-01  3.348e-02 -10.482  < 2e-16 ***
## regionDetroit                      -2.016e-01  3.349e-02  -6.021 1.77e-09 ***
## regionGrandRapids                  -1.226e-01  3.348e-02  -3.662 0.000251 ***
## regionGreatLakes                   -2.160e-01  3.353e-02  -6.442 1.21e-10 ***
## regionHarrisburgScranton           -6.714e-02  3.348e-02  -2.005 0.044969 *  
## regionHartfordSpringfield           2.090e-01  3.348e-02   6.243 4.38e-10 ***
## regionHouston                      -4.908e-01  3.348e-02 -14.659  < 2e-16 ***
## regionIndianapolis                 -1.960e-01  3.348e-02  -5.854 4.87e-09 ***
## regionJacksonville                 -3.567e-02  3.348e-02  -1.065 0.286701    
## regionLasVegas                     -1.699e-01  3.348e-02  -5.074 3.93e-07 ***
## regionLosAngeles                   -3.867e-01  3.348e-02 -11.549  < 2e-16 ***
## regionLouisville                   -2.444e-01  3.348e-02  -7.300 3.00e-13 ***
## regionMiamiFtLauderdale            -1.552e-01  3.348e-02  -4.635 3.59e-06 ***
## regionMidsouth                     -1.876e-01  3.348e-02  -5.603 2.13e-08 ***
## regionNashville                    -2.616e-01  3.348e-02  -7.812 5.95e-15 ***
## regionNewOrleansMobile             -2.711e-01  3.348e-02  -8.095 6.07e-16 ***
## regionNewYork                       1.058e-01  3.348e-02   3.159 0.001587 ** 
## regionNortheast                     4.958e-03  3.348e-02   0.148 0.882293    
## regionNorthernNewEngland           -6.538e-02  3.348e-02  -1.953 0.050862 .  
## regionOrlando                      -3.943e-02  3.348e-02  -1.177 0.239023    
## regionPhiladelphia                  1.644e-02  3.348e-02   0.491 0.623441    
## regionPhoenixTucson                -3.816e-01  3.348e-02 -11.398  < 2e-16 ***
## regionPittsburgh                   -1.315e-01  3.348e-02  -3.929 8.58e-05 ***
## regionPlains                       -1.011e-01  3.348e-02  -3.019 0.002543 ** 
## regionPortland                     -2.320e-01  3.348e-02  -6.928 4.41e-12 ***
## regionRaleighGreensboro            -8.933e-02  3.348e-02  -2.668 0.007641 ** 
## regionRichmondNorfolk              -2.642e-01  3.348e-02  -7.892 3.15e-15 ***
## regionRoanoke                      -3.116e-01  3.348e-02  -9.307  < 2e-16 ***
## regionSacramento                   -8.471e-02  3.348e-02  -2.530 0.011414 *  
## regionSanDiego                     -2.645e-01  3.348e-02  -7.900 2.94e-15 ***
## regionSanFrancisco                  8.230e-02  3.348e-02   2.458 0.013980 *  
## regionSeattle                      -1.166e-01  3.348e-02  -3.481 0.000501 ***
## regionSouthCarolina                -8.404e-02  3.348e-02  -2.510 0.012085 *  
## regionSouthCentral                 -4.272e-01  3.348e-02 -12.759  < 2e-16 ***
## regionSoutheast                    -1.241e-01  3.348e-02  -3.705 0.000212 ***
## regionSpokane                      -1.384e-01  3.348e-02  -4.132 3.61e-05 ***
## regionStLouis                      -3.539e-02  3.348e-02  -1.057 0.290606    
## regionSyracuse                     -9.711e-03  3.348e-02  -0.290 0.771793    
## regionTampa                        -1.821e-01  3.348e-02  -5.439 5.42e-08 ***
## regionTotalUS                      -2.864e-01  3.358e-02  -8.531  < 2e-16 ***
## regionWest                         -3.011e-01  3.348e-02  -8.992  < 2e-16 ***
## regionWestTexNewMexico             -2.767e-01  3.356e-02  -8.243  < 2e-16 ***
## quarter2                            8.069e-02  5.265e-03  15.325  < 2e-16 ***
## quarter3                            2.184e-01  5.268e-03  41.450  < 2e-16 ***
## quarter4                            1.620e-01  5.229e-03  30.989  < 2e-16 ***
## year2016                           -4.861e-03  3.348e-02  -0.145 0.884570    
## year2017                            9.815e-02  3.332e-02   2.945 0.003230 ** 
## year2018                            1.233e-02  5.477e-02   0.225 0.821920    
## x_large_bags                        2.575e-07  1.277e-07   2.017 0.043674 *  
## regionAtlanta:year2016             -1.617e-01  4.735e-02  -3.414 0.000641 ***
## regionBaltimoreWashington:year2016  2.234e-01  4.735e-02   4.718 2.40e-06 ***
## regionBoise:year2016               -2.270e-01  4.735e-02  -4.793 1.65e-06 ***
## regionBoston:year2016              -4.263e-02  4.735e-02  -0.900 0.367957    
## regionBuffaloRochester:year2016    -5.600e-02  4.735e-02  -1.183 0.237011    
## regionCalifornia:year2016           1.603e-02  4.737e-02   0.338 0.735153    
## regionCharlotte:year2016           -7.308e-02  4.735e-02  -1.543 0.122784    
## regionChicago:year2016              1.479e-01  4.735e-02   3.123 0.001794 ** 
## regionCincinnatiDayton:year2016    -1.091e-01  4.735e-02  -2.304 0.021212 *  
## regionColumbus:year2016            -8.266e-02  4.735e-02  -1.746 0.080880 .  
## regionDallasFtWorth:year2016       -7.729e-02  4.735e-02  -1.632 0.102631    
## regionDenver:year2016              -8.983e-02  4.735e-02  -1.897 0.057824 .  
## regionDetroit:year2016             -1.613e-01  4.735e-02  -3.406 0.000661 ***
## regionGrandRapids:year2016          9.775e-02  4.735e-02   2.064 0.038991 *  
## regionGreatLakes:year2016          -4.595e-02  4.736e-02  -0.970 0.331929    
## regionHarrisburgScranton:year2016   4.470e-02  4.735e-02   0.944 0.345203    
## regionHartfordSpringfield:year2016  1.080e-01  4.735e-02   2.282 0.022511 *  
## regionHouston:year2016             -5.176e-02  4.735e-02  -1.093 0.274356    
## regionIndianapolis:year2016        -3.665e-02  4.735e-02  -0.774 0.438916    
## regionJacksonville:year2016        -1.307e-01  4.735e-02  -2.759 0.005797 ** 
## regionLasVegas:year2016            -1.159e-02  4.735e-02  -0.245 0.806598    
## regionLosAngeles:year2016          -6.631e-02  4.737e-02  -1.400 0.161523    
## regionLouisville:year2016          -7.807e-02  4.735e-02  -1.649 0.099214 .  
## regionMiamiFtLauderdale:year2016   -9.933e-02  4.735e-02  -2.098 0.035939 *  
## regionMidsouth:year2016             3.128e-03  4.736e-02   0.066 0.947330    
## regionNashville:year2016           -1.562e-01  4.735e-02  -3.299 0.000971 ***
## regionNewOrleansMobile:year2016    -1.471e-02  4.735e-02  -0.311 0.756016    
## regionNewYork:year2016              1.222e-01  4.735e-02   2.581 0.009869 ** 
## regionNortheast:year2016            5.556e-02  4.736e-02   1.173 0.240727    
## regionNorthernNewEngland:year2016  -7.595e-02  4.735e-02  -1.604 0.108755    
## regionOrlando:year2016             -1.241e-01  4.735e-02  -2.620 0.008803 ** 
## regionPhiladelphia:year2016         1.244e-01  4.735e-02   2.627 0.008634 ** 
## regionPhoenixTucson:year2016        1.065e-01  4.735e-02   2.248 0.024571 *  
## regionPittsburgh:year2016          -5.907e-02  4.735e-02  -1.247 0.212245    
## regionPlains:year2016              -5.670e-02  4.736e-02  -1.197 0.231193    
## regionPortland:year2016            -1.103e-01  4.735e-02  -2.330 0.019806 *  
## regionRaleighGreensboro:year2016    3.099e-03  4.735e-02   0.065 0.947823    
## regionRichmondNorfolk:year2016     -5.864e-02  4.735e-02  -1.238 0.215586    
## regionRoanoke:year2016             -7.486e-02  4.735e-02  -1.581 0.113932    
## regionSacramento:year2016           2.189e-01  4.735e-02   4.623 3.80e-06 ***
## regionSanDiego:year2016             4.432e-02  4.735e-02   0.936 0.349308    
## regionSanFrancisco:year2016         2.650e-01  4.735e-02   5.596 2.23e-08 ***
## regionSeattle:year2016             -1.171e-01  4.735e-02  -2.473 0.013403 *  
## regionSouthCarolina:year2016       -1.449e-01  4.735e-02  -3.060 0.002215 ** 
## regionSouthCentral:year2016        -8.303e-02  4.737e-02  -1.753 0.079655 .  
## regionSoutheast:year2016           -1.250e-01  4.736e-02  -2.640 0.008305 ** 
## regionSpokane:year2016             -6.197e-02  4.735e-02  -1.309 0.190631    
## regionStLouis:year2016             -3.134e-01  4.735e-02  -6.619 3.73e-11 ***
## regionSyracuse:year2016            -2.077e-02  4.735e-02  -0.439 0.660952    
## regionTampa:year2016               -8.761e-02  4.735e-02  -1.850 0.064290 .  
## regionTotalUS:year2016             -2.523e-03  4.782e-02  -0.053 0.957917    
## regionWest:year2016                -5.260e-02  4.735e-02  -1.111 0.266681    
## regionWestTexNewMexico:year2016    -1.118e-02  4.741e-02  -0.236 0.813538    
## regionAtlanta:year2017             -5.133e-02  4.713e-02  -1.089 0.276065    
## regionBaltimoreWashington:year2017  2.113e-01  4.713e-02   4.483 7.40e-06 ***
## regionBoise:year2017                1.985e-02  4.713e-02   0.421 0.673552    
## regionBoston:year2017               1.068e-01  4.713e-02   2.267 0.023398 *  
## regionBuffaloRochester:year2017    -5.601e-02  4.713e-02  -1.189 0.234633    
## regionCalifornia:year2017           1.126e-01  4.723e-02   2.384 0.017123 *  
## regionCharlotte:year2017            9.489e-02  4.713e-02   2.013 0.044081 *  
## regionChicago:year2017              2.114e-01  4.713e-02   4.486 7.31e-06 ***
## regionCincinnatiDayton:year2017     1.831e-02  4.713e-02   0.389 0.697600    
## regionColumbus:year2017            -5.701e-02  4.713e-02  -1.210 0.226414    
## regionDallasFtWorth:year2017        1.122e-04  4.713e-02   0.002 0.998100    
## regionDenver:year2017               7.089e-02  4.713e-02   1.504 0.132563    
## regionDetroit:year2017             -9.823e-02  4.713e-02  -2.084 0.037149 *  
## regionGrandRapids:year2017          1.115e-01  4.713e-02   2.366 0.017969 *  
## regionGreatLakes:year2017          -2.282e-03  4.713e-02  -0.048 0.961394    
## regionHarrisburgScranton:year2017   2.497e-02  4.713e-02   0.530 0.596285    
## regionHartfordSpringfield:year2017  4.139e-02  4.713e-02   0.878 0.379792    
## regionHouston:year2017             -4.291e-02  4.713e-02  -0.911 0.362553    
## regionIndianapolis:year2017        -1.111e-01  4.713e-02  -2.357 0.018445 *  
## regionJacksonville:year2017         6.928e-02  4.713e-02   1.470 0.141549    
## regionLasVegas:year2017            -5.007e-02  4.713e-02  -1.062 0.288096    
## regionLosAngeles:year2017           1.211e-01  4.719e-02   2.566 0.010299 *  
## regionLouisville:year2017          -3.632e-02  4.713e-02  -0.771 0.440950    
## regionMiamiFtLauderdale:year2017    1.547e-01  4.713e-02   3.282 0.001034 ** 
## regionMidsouth:year2017             6.894e-02  4.713e-02   1.463 0.143549    
## regionNashville:year2017           -1.364e-01  4.713e-02  -2.894 0.003810 ** 
## regionNewOrleansMobile:year2017     5.163e-02  4.713e-02   1.095 0.273319    
## regionNewYork:year2017              6.570e-02  4.713e-02   1.394 0.163320    
## regionNortheast:year2017            4.965e-02  4.713e-02   1.053 0.292145    
## regionNorthernNewEngland:year2017   4.649e-03  4.713e-02   0.099 0.921414    
## regionOrlando:year2017              8.160e-02  4.713e-02   1.731 0.083380 .  
## regionPhiladelphia:year2017         5.293e-02  4.713e-02   1.123 0.261438    
## regionPhoenixTucson:year2017        1.622e-02  4.713e-02   0.344 0.730705    
## regionPittsburgh:year2017          -1.434e-01  4.713e-02  -3.042 0.002352 ** 
## regionPlains:year2017              -2.733e-02  4.713e-02  -0.580 0.561981    
## regionPortland:year2017             2.846e-02  4.713e-02   0.604 0.545918    
## regionRaleighGreensboro:year2017    2.201e-01  4.713e-02   4.671 3.02e-06 ***
## regionRichmondNorfolk:year2017      2.554e-02  4.713e-02   0.542 0.587832    
## regionRoanoke:year2017              3.208e-02  4.713e-02   0.681 0.496074    
## regionSacramento:year2017           2.207e-01  4.713e-02   4.683 2.84e-06 ***
## regionSanDiego:year2017             2.111e-01  4.713e-02   4.478 7.57e-06 ***
## regionSanFrancisco:year2017         2.455e-01  4.713e-02   5.210 1.91e-07 ***
## regionSeattle:year2017              7.804e-02  4.713e-02   1.656 0.097776 .  
## regionSouthCarolina:year2017       -7.433e-02  4.713e-02  -1.577 0.114754    
## regionSouthCentral:year2017        -4.930e-02  4.713e-02  -1.046 0.295573    
## regionSoutheast:year2017           -5.160e-03  4.716e-02  -0.109 0.912881    
## regionSpokane:year2017              1.051e-01  4.713e-02   2.230 0.025752 *  
## regionStLouis:year2017             -1.074e-02  4.713e-02  -0.228 0.819670    
## regionSyracuse:year2017            -3.869e-02  4.713e-02  -0.821 0.411693    
## regionTampa:year2017                1.635e-01  4.713e-02   3.469 0.000525 ***
## regionTotalUS:year2017              6.318e-02  4.787e-02   1.320 0.186917    
## regionWest:year2017                 5.256e-02  4.713e-02   1.115 0.264729    
## regionWestTexNewMexico:year2017    -7.553e-02  4.730e-02  -1.597 0.110308    
## regionAtlanta:year2018              1.072e-02  7.733e-02   0.139 0.889782    
## regionBaltimoreWashington:year2018  1.124e-01  7.733e-02   1.453 0.146104    
## regionBoise:year2018                2.217e-01  7.733e-02   2.867 0.004149 ** 
## regionBoston:year2018               2.059e-01  7.733e-02   2.663 0.007742 ** 
## regionBuffaloRochester:year2018    -2.155e-01  7.733e-02  -2.787 0.005332 ** 
## regionCalifornia:year2018           1.892e-01  7.746e-02   2.443 0.014578 *  
## regionCharlotte:year2018            9.679e-03  7.733e-02   0.125 0.900390    
## regionChicago:year2018              2.606e-01  7.733e-02   3.370 0.000753 ***
## regionCincinnatiDayton:year2018     1.764e-01  7.733e-02   2.281 0.022553 *  
## regionColumbus:year2018             9.079e-04  7.733e-02   0.012 0.990632    
## regionDallasFtWorth:year2018        1.265e-01  7.733e-02   1.636 0.101773    
## regionDenver:year2018               1.960e-01  7.733e-02   2.535 0.011261 *  
## regionDetroit:year2018             -5.785e-02  7.733e-02  -0.748 0.454421    
## regionGrandRapids:year2018          1.287e-02  7.733e-02   0.166 0.867847    
## regionGreatLakes:year2018           4.963e-02  7.737e-02   0.642 0.521201    
## regionHarrisburgScranton:year2018  -3.216e-02  7.733e-02  -0.416 0.677506    
## regionHartfordSpringfield:year2018  3.256e-02  7.733e-02   0.421 0.673717    
## regionHouston:year2018              9.701e-02  7.733e-02   1.255 0.209633    
## regionIndianapolis:year2018        -7.169e-02  7.733e-02  -0.927 0.353907    
## regionJacksonville:year2018         5.652e-02  7.733e-02   0.731 0.464859    
## regionLasVegas:year2018             1.278e-01  7.733e-02   1.653 0.098411 .  
## regionLosAngeles:year2018           2.964e-01  7.738e-02   3.831 0.000128 ***
## regionLouisville:year2018           7.646e-02  7.733e-02   0.989 0.322751    
## regionMiamiFtLauderdale:year2018    6.354e-02  7.733e-02   0.822 0.411224    
## regionMidsouth:year2018             1.091e-01  7.733e-02   1.411 0.158407    
## regionNashville:year2018            4.803e-02  7.733e-02   0.621 0.534514    
## regionNewOrleansMobile:year2018     3.935e-02  7.733e-02   0.509 0.610832    
## regionNewYork:year2018              3.283e-02  7.733e-02   0.425 0.671123    
## regionNortheast:year2018            3.237e-02  7.733e-02   0.419 0.675477    
## regionNorthernNewEngland:year2018   5.076e-02  7.733e-02   0.656 0.511534    
## regionOrlando:year2018             -4.181e-02  7.733e-02  -0.541 0.588694    
## regionPhiladelphia:year2018        -3.644e-03  7.733e-02  -0.047 0.962413    
## regionPhoenixTucson:year2018        1.001e-01  7.733e-02   1.295 0.195404    
## regionPittsburgh:year2018          -2.885e-02  7.733e-02  -0.373 0.709069    
## regionPlains:year2018               2.461e-02  7.733e-02   0.318 0.750245    
## regionPortland:year2018             1.923e-01  7.733e-02   2.487 0.012901 *  
## regionRaleighGreensboro:year2018    1.885e-01  7.733e-02   2.438 0.014775 *  
## regionRichmondNorfolk:year2018      6.336e-02  7.733e-02   0.819 0.412568    
## regionRoanoke:year2018              1.616e-01  7.733e-02   2.090 0.036623 *  
## regionSacramento:year2018           1.202e-01  7.733e-02   1.555 0.119988    
## regionSanDiego:year2018             3.063e-01  7.733e-02   3.962 7.48e-05 ***
## regionSanFrancisco:year2018         3.107e-02  7.733e-02   0.402 0.687859    
## regionSeattle:year2018              1.357e-01  7.733e-02   1.754 0.079378 .  
## regionSouthCarolina:year2018       -8.383e-02  7.733e-02  -1.084 0.278339    
## regionSouthCentral:year2018         9.105e-02  7.736e-02   1.177 0.239230    
## regionSoutheast:year2018           -1.071e-02  7.733e-02  -0.139 0.889809    
## regionSpokane:year2018              1.276e-01  7.733e-02   1.650 0.099053 .  
## regionStLouis:year2018              6.527e-02  7.733e-02   0.844 0.398628    
## regionSyracuse:year2018            -1.757e-01  7.733e-02  -2.272 0.023105 *  
## regionTampa:year2018                7.714e-02  7.733e-02   0.998 0.318505    
## regionTotalUS:year2018              1.279e-01  7.829e-02   1.633 0.102468    
## regionWest:year2018                 1.602e-01  7.733e-02   2.072 0.038312 *  
## regionWestTexNewMexico:year2018     9.183e-02  7.736e-02   1.187 0.235229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2414 on 18028 degrees of freedom
## Multiple R-squared:  0.6448, Adjusted R-squared:  0.6405 
## F-statistic: 148.8 on 220 and 18028 DF,  p-value: < 2.2e-16
model5pg <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pg)

summary(model5pg)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + region:x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00590 -0.14516 -0.00347  0.14267  1.44125 
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             1.157e+00  1.475e-02  78.422  < 2e-16
## typeorganic                             4.999e-01  4.037e-03 123.829  < 2e-16
## regionAtlanta                          -2.087e-01  2.013e-02 -10.371  < 2e-16
## regionBaltimoreWashington              -3.231e-02  1.986e-02  -1.627 0.103767
## regionBoise                            -1.940e-01  1.977e-02  -9.809  < 2e-16
## regionBoston                           -3.887e-02  1.994e-02  -1.949 0.051298
## regionBuffaloRochester                 -3.757e-02  1.981e-02  -1.896 0.057934
## regionCalifornia                       -1.486e-01  2.138e-02  -6.952 3.73e-12
## regionCharlotte                         5.896e-02  1.972e-02   2.989 0.002799
## regionChicago                          -1.992e-02  2.044e-02  -0.975 0.329738
## regionCincinnatiDayton                 -3.552e-01  2.050e-02 -17.331  < 2e-16
## regionColumbus                         -3.118e-01  2.049e-02 -15.216  < 2e-16
## regionDallasFtWorth                    -4.683e-01  1.983e-02 -23.612  < 2e-16
## regionDenver                           -3.389e-01  1.973e-02 -17.180  < 2e-16
## regionDetroit                          -3.106e-01  2.107e-02 -14.743  < 2e-16
## regionGrandRapids                      -7.090e-02  2.040e-02  -3.475 0.000512
## regionGreatLakes                       -2.471e-01  2.142e-02 -11.533  < 2e-16
## regionHarrisburgScranton               -3.937e-02  1.982e-02  -1.986 0.047013
## regionHartfordSpringfield               2.737e-01  1.994e-02  13.724  < 2e-16
## regionHouston                          -5.100e-01  1.971e-02 -25.876  < 2e-16
## regionIndianapolis                     -2.525e-01  2.077e-02 -12.158  < 2e-16
## regionJacksonville                     -3.545e-02  1.987e-02  -1.784 0.074463
## regionLasVegas                         -1.643e-01  1.982e-02  -8.289  < 2e-16
## regionLosAngeles                       -3.536e-01  2.157e-02 -16.388  < 2e-16
## regionLouisville                       -2.774e-01  2.048e-02 -13.545  < 2e-16
## regionMiamiFtLauderdale                -1.309e-01  1.981e-02  -6.605 4.08e-11
## regionMidsouth                         -1.541e-01  2.005e-02  -7.685 1.60e-14
## regionNashville                        -3.399e-01  2.060e-02 -16.496  < 2e-16
## regionNewOrleansMobile                 -2.666e-01  2.032e-02 -13.122  < 2e-16
## regionNewYork                           1.685e-01  1.998e-02   8.431  < 2e-16
## regionNortheast                         4.073e-02  1.997e-02   2.039 0.041431
## regionNorthernNewEngland               -8.220e-02  1.977e-02  -4.158 3.22e-05
## regionOrlando                          -3.994e-02  1.979e-02  -2.019 0.043541
## regionPhiladelphia                      7.529e-02  1.985e-02   3.792 0.000150
## regionPhoenixTucson                    -2.935e-01  1.998e-02 -14.689  < 2e-16
## regionPittsburgh                       -1.966e-01  1.969e-02  -9.988  < 2e-16
## regionPlains                           -1.101e-01  2.035e-02  -5.409 6.42e-08
## regionPortland                         -2.287e-01  2.014e-02 -11.354  < 2e-16
## regionRaleighGreensboro                 8.980e-03  1.965e-02   0.457 0.647707
## regionRichmondNorfolk                  -2.639e-01  1.975e-02 -13.361  < 2e-16
## regionRoanoke                          -3.083e-01  1.979e-02 -15.577  < 2e-16
## regionSacramento                        9.105e-02  2.024e-02   4.498 6.89e-06
## regionSanDiego                         -1.403e-01  2.038e-02  -6.887 5.89e-12
## regionSanFrancisco                      2.908e-01  2.051e-02  14.180  < 2e-16
## regionSeattle                          -1.056e-01  2.097e-02  -5.035 4.81e-07
## regionSouthCarolina                    -1.508e-01  2.000e-02  -7.541 4.87e-14
## regionSouthCentral                     -4.547e-01  2.026e-02 -22.448  < 2e-16
## regionSoutheast                        -1.581e-01  2.016e-02  -7.843 4.63e-15
## regionSpokane                          -8.415e-02  2.025e-02  -4.156 3.25e-05
## regionStLouis                          -1.118e-01  1.972e-02  -5.670 1.45e-08
## regionSyracuse                         -3.950e-02  1.975e-02  -2.000 0.045480
## regionTampa                            -1.470e-01  1.980e-02  -7.427 1.16e-13
## regionTotalUS                          -2.436e-01  2.125e-02 -11.463  < 2e-16
## regionWest                             -2.673e-01  2.074e-02 -12.891  < 2e-16
## regionWestTexNewMexico                 -2.812e-01  1.972e-02 -14.260  < 2e-16
## quarter2                                7.926e-02  5.408e-03  14.654  < 2e-16
## quarter3                                2.156e-01  5.472e-03  39.402  < 2e-16
## quarter4                                1.645e-01  5.346e-03  30.762  < 2e-16
## year2016                               -3.867e-02  4.730e-03  -8.175 3.14e-16
## year2017                                1.399e-01  4.764e-03  29.355  < 2e-16
## year2018                                9.389e-02  8.481e-03  11.071  < 2e-16
## x_large_bags                            6.780e-05  3.165e-05   2.142 0.032202
## regionAtlanta:x_large_bags             -7.465e-05  3.229e-05  -2.311 0.020817
## regionBaltimoreWashington:x_large_bags -4.459e-05  3.238e-05  -1.377 0.168604
## regionBoise:x_large_bags               -3.981e-04  1.293e-04  -3.079 0.002077
## regionBoston:x_large_bags               1.603e-06  3.660e-05   0.044 0.965060
## regionBuffaloRochester:x_large_bags    -5.922e-05  3.576e-05  -1.656 0.097767
## regionCalifornia:x_large_bags          -6.834e-05  3.165e-05  -2.159 0.030867
## regionCharlotte:x_large_bags           -9.327e-05  3.610e-05  -2.584 0.009779
## regionChicago:x_large_bags             -4.606e-05  3.215e-05  -1.433 0.151906
## regionCincinnatiDayton:x_large_bags    -5.231e-05  3.275e-05  -1.597 0.110224
## regionColumbus:x_large_bags            -4.902e-05  3.321e-05  -1.476 0.139968
## regionDallasFtWorth:x_large_bags       -6.657e-05  3.181e-05  -2.093 0.036381
## regionDenver:x_large_bags              -3.469e-05  3.926e-05  -0.884 0.376950
## regionDetroit:x_large_bags             -6.075e-05  3.169e-05  -1.917 0.055232
## regionGrandRapids:x_large_bags         -5.830e-05  3.174e-05  -1.837 0.066265
## regionGreatLakes:x_large_bags          -6.604e-05  3.165e-05  -2.086 0.036958
## regionHarrisburgScranton:x_large_bags  -6.708e-05  3.286e-05  -2.042 0.041194
## regionHartfordSpringfield:x_large_bags -9.982e-05  3.752e-05  -2.660 0.007810
## regionHouston:x_large_bags             -6.198e-05  3.185e-05  -1.946 0.051713
## regionIndianapolis:x_large_bags        -5.092e-05  3.283e-05  -1.551 0.120934
## regionJacksonville:x_large_bags        -8.681e-05  3.455e-05  -2.513 0.011991
## regionLasVegas:x_large_bags            -2.179e-04  9.195e-05  -2.370 0.017784
## regionLosAngeles:x_large_bags          -6.637e-05  3.166e-05  -2.097 0.036039
## regionLouisville:x_large_bags          -3.106e-05  3.772e-05  -0.823 0.410237
## regionMiamiFtLauderdale:x_large_bags   -6.002e-05  3.195e-05  -1.879 0.060329
## regionMidsouth:x_large_bags            -6.620e-05  3.167e-05  -2.090 0.036587
## regionNashville:x_large_bags           -6.882e-05  3.798e-05  -1.812 0.070007
## regionNewOrleansMobile:x_large_bags    -5.523e-05  3.188e-05  -1.732 0.083257
## regionNewYork:x_large_bags             -6.142e-05  3.195e-05  -1.922 0.054590
## regionNortheast:x_large_bags           -6.551e-05  3.167e-05  -2.069 0.038594
## regionNorthernNewEngland:x_large_bags  -4.558e-05  3.371e-05  -1.352 0.176342
## regionOrlando:x_large_bags             -7.638e-05  3.209e-05  -2.380 0.017329
## regionPhiladelphia:x_large_bags        -5.342e-05  3.439e-05  -1.553 0.120351
## regionPhoenixTucson:x_large_bags       -1.429e-04  3.331e-05  -4.291 1.79e-05
## regionPittsburgh:x_large_bags          -1.719e-05  3.737e-05  -0.460 0.645586
## regionPlains:x_large_bags              -6.954e-05  3.170e-05  -2.194 0.028244
## regionPortland:x_large_bags            -9.347e-05  3.944e-05  -2.370 0.017806
## regionRaleighGreensboro:x_large_bags   -8.980e-05  3.359e-05  -2.674 0.007512
## regionRichmondNorfolk:x_large_bags     -5.979e-05  3.324e-05  -1.799 0.072098
## regionRoanoke:x_large_bags             -5.109e-05  3.583e-05  -1.426 0.153899
## regionSacramento:x_large_bags          -1.031e-04  3.298e-05  -3.126 0.001774
## regionSanDiego:x_large_bags            -1.000e-04  3.480e-05  -2.874 0.004055
## regionSanFrancisco:x_large_bags        -1.300e-04  3.336e-05  -3.896 9.81e-05
## regionSeattle:x_large_bags             -8.839e-05  5.055e-05  -1.749 0.080361
## regionSouthCarolina:x_large_bags       -6.511e-05  3.246e-05  -2.006 0.044848
## regionSouthCentral:x_large_bags        -6.733e-05  3.166e-05  -2.127 0.033439
## regionSoutheast:x_large_bags           -6.729e-05  3.166e-05  -2.126 0.033558
## regionSpokane:x_large_bags             -1.130e-03  2.762e-04  -4.093 4.28e-05
## regionStLouis:x_large_bags             -8.268e-05  3.208e-05  -2.577 0.009971
## regionSyracuse:x_large_bags            -7.061e-06  4.375e-05  -0.161 0.871772
## regionTampa:x_large_bags               -6.270e-05  3.214e-05  -1.951 0.051068
## regionTotalUS:x_large_bags             -6.764e-05  3.165e-05  -2.137 0.032614
## regionWest:x_large_bags                -7.298e-05  3.178e-05  -2.297 0.021641
## regionWestTexNewMexico:x_large_bags    -7.497e-05  3.184e-05  -2.354 0.018573
##                                           
## (Intercept)                            ***
## typeorganic                            ***
## regionAtlanta                          ***
## regionBaltimoreWashington                 
## regionBoise                            ***
## regionBoston                           .  
## regionBuffaloRochester                 .  
## regionCalifornia                       ***
## regionCharlotte                        ** 
## regionChicago                             
## regionCincinnatiDayton                 ***
## regionColumbus                         ***
## regionDallasFtWorth                    ***
## regionDenver                           ***
## regionDetroit                          ***
## regionGrandRapids                      ***
## regionGreatLakes                       ***
## regionHarrisburgScranton               *  
## regionHartfordSpringfield              ***
## regionHouston                          ***
## regionIndianapolis                     ***
## regionJacksonville                     .  
## regionLasVegas                         ***
## regionLosAngeles                       ***
## regionLouisville                       ***
## regionMiamiFtLauderdale                ***
## regionMidsouth                         ***
## regionNashville                        ***
## regionNewOrleansMobile                 ***
## regionNewYork                          ***
## regionNortheast                        *  
## regionNorthernNewEngland               ***
## regionOrlando                          *  
## regionPhiladelphia                     ***
## regionPhoenixTucson                    ***
## regionPittsburgh                       ***
## regionPlains                           ***
## regionPortland                         ***
## regionRaleighGreensboro                   
## regionRichmondNorfolk                  ***
## regionRoanoke                          ***
## regionSacramento                       ***
## regionSanDiego                         ***
## regionSanFrancisco                     ***
## regionSeattle                          ***
## regionSouthCarolina                    ***
## regionSouthCentral                     ***
## regionSoutheast                        ***
## regionSpokane                          ***
## regionStLouis                          ***
## regionSyracuse                         *  
## regionTampa                            ***
## regionTotalUS                          ***
## regionWest                             ***
## regionWestTexNewMexico                 ***
## quarter2                               ***
## quarter3                               ***
## quarter4                               ***
## year2016                               ***
## year2017                               ***
## year2018                               ***
## x_large_bags                           *  
## regionAtlanta:x_large_bags             *  
## regionBaltimoreWashington:x_large_bags    
## regionBoise:x_large_bags               ** 
## regionBoston:x_large_bags                 
## regionBuffaloRochester:x_large_bags    .  
## regionCalifornia:x_large_bags          *  
## regionCharlotte:x_large_bags           ** 
## regionChicago:x_large_bags                
## regionCincinnatiDayton:x_large_bags       
## regionColumbus:x_large_bags               
## regionDallasFtWorth:x_large_bags       *  
## regionDenver:x_large_bags                 
## regionDetroit:x_large_bags             .  
## regionGrandRapids:x_large_bags         .  
## regionGreatLakes:x_large_bags          *  
## regionHarrisburgScranton:x_large_bags  *  
## regionHartfordSpringfield:x_large_bags ** 
## regionHouston:x_large_bags             .  
## regionIndianapolis:x_large_bags           
## regionJacksonville:x_large_bags        *  
## regionLasVegas:x_large_bags            *  
## regionLosAngeles:x_large_bags          *  
## regionLouisville:x_large_bags             
## regionMiamiFtLauderdale:x_large_bags   .  
## regionMidsouth:x_large_bags            *  
## regionNashville:x_large_bags           .  
## regionNewOrleansMobile:x_large_bags    .  
## regionNewYork:x_large_bags             .  
## regionNortheast:x_large_bags           *  
## regionNorthernNewEngland:x_large_bags     
## regionOrlando:x_large_bags             *  
## regionPhiladelphia:x_large_bags           
## regionPhoenixTucson:x_large_bags       ***
## regionPittsburgh:x_large_bags             
## regionPlains:x_large_bags              *  
## regionPortland:x_large_bags            *  
## regionRaleighGreensboro:x_large_bags   ** 
## regionRichmondNorfolk:x_large_bags     .  
## regionRoanoke:x_large_bags                
## regionSacramento:x_large_bags          ** 
## regionSanDiego:x_large_bags            ** 
## regionSanFrancisco:x_large_bags        ***
## regionSeattle:x_large_bags             .  
## regionSouthCarolina:x_large_bags       *  
## regionSouthCentral:x_large_bags        *  
## regionSoutheast:x_large_bags           *  
## regionSpokane:x_large_bags             ***
## regionStLouis:x_large_bags             ** 
## regionSyracuse:x_large_bags               
## regionTampa:x_large_bags               .  
## regionTotalUS:x_large_bags             *  
## regionWest:x_large_bags                *  
## regionWestTexNewMexico:x_large_bags    *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2465 on 18134 degrees of freedom
## Multiple R-squared:  0.6276, Adjusted R-squared:  0.6253 
## F-statistic: 268.1 on 114 and 18134 DF,  p-value: < 2.2e-16
model5ph <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5ph)

summary(model5ph)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + quarter:year, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96209 -0.13588 -0.00192  0.13567  1.48311 
## 
## Coefficients: (3 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.259e+00  1.454e-02  86.603  < 2e-16 ***
## typeorganic                4.983e-01  3.630e-03 137.274  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.846e-02 -12.101  < 2e-16 ***
## regionBaltimoreWashington -2.699e-02  1.846e-02  -1.463 0.143613    
## regionBoise               -2.129e-01  1.846e-02 -11.533  < 2e-16 ***
## regionBoston              -3.020e-02  1.846e-02  -1.636 0.101842    
## regionBuffaloRochester    -4.425e-02  1.846e-02  -2.397 0.016525 *  
## regionCalifornia          -1.717e-01  1.855e-02  -9.257  < 2e-16 ***
## regionCharlotte            4.497e-02  1.846e-02   2.437 0.014836 *  
## regionChicago             -4.646e-03  1.846e-02  -0.252 0.801250    
## regionCincinnatiDayton    -3.521e-01  1.846e-02 -19.077  < 2e-16 ***
## regionColumbus            -3.085e-01  1.846e-02 -16.713  < 2e-16 ***
## regionDallasFtWorth       -4.759e-01  1.846e-02 -25.784  < 2e-16 ***
## regionDenver              -3.425e-01  1.846e-02 -18.556  < 2e-16 ***
## regionDetroit             -2.868e-01  1.847e-02 -15.531  < 2e-16 ***
## regionGrandRapids         -5.695e-02  1.846e-02  -3.085 0.002036 ** 
## regionGreatLakes          -2.298e-01  1.859e-02 -12.358  < 2e-16 ***
## regionHarrisburgScranton  -4.788e-02  1.846e-02  -2.594 0.009488 ** 
## regionHartfordSpringfield  2.576e-01  1.846e-02  13.955  < 2e-16 ***
## regionHouston             -5.134e-01  1.846e-02 -27.819  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.846e-02 -13.400  < 2e-16 ***
## regionJacksonville        -5.016e-02  1.846e-02  -2.718 0.006578 ** 
## regionLasVegas            -1.801e-01  1.846e-02  -9.758  < 2e-16 ***
## regionLosAngeles          -3.497e-01  1.851e-02 -18.888  < 2e-16 ***
## regionLouisville          -2.744e-01  1.846e-02 -14.869  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.846e-02  -7.198 6.36e-13 ***
## regionMidsouth            -1.578e-01  1.846e-02  -8.548  < 2e-16 ***
## regionNashville           -3.490e-01  1.846e-02 -18.910  < 2e-16 ***
## regionNewOrleansMobile    -2.568e-01  1.846e-02 -13.913  < 2e-16 ***
## regionNewYork              1.662e-01  1.846e-02   9.004  < 2e-16 ***
## regionNortheast            3.943e-02  1.846e-02   2.136 0.032703 *  
## regionNorthernNewEngland  -8.372e-02  1.846e-02  -4.536 5.77e-06 ***
## regionOrlando             -5.505e-02  1.846e-02  -2.983 0.002860 ** 
## regionPhiladelphia         7.102e-02  1.846e-02   3.848 0.000119 ***
## regionPhoenixTucson       -3.367e-01  1.846e-02 -18.245  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.846e-02 -10.659  < 2e-16 ***
## regionPlains              -1.258e-01  1.846e-02  -6.812 9.90e-12 ***
## regionPortland            -2.434e-01  1.846e-02 -13.185  < 2e-16 ***
## regionRaleighGreensboro   -5.977e-03  1.846e-02  -0.324 0.746076    
## regionRichmondNorfolk     -2.698e-01  1.846e-02 -14.618  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.846e-02 -16.967  < 2e-16 ***
## regionSacramento           6.034e-02  1.846e-02   3.269 0.001079 ** 
## regionSanDiego            -1.630e-01  1.846e-02  -8.831  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.846e-02  13.165  < 2e-16 ***
## regionSeattle             -1.185e-01  1.846e-02  -6.420 1.40e-10 ***
## regionSouthCarolina       -1.580e-01  1.846e-02  -8.559  < 2e-16 ***
## regionSouthCentral        -4.628e-01  1.848e-02 -25.043  < 2e-16 ***
## regionSoutheast           -1.659e-01  1.848e-02  -8.977  < 2e-16 ***
## regionSpokane             -1.154e-01  1.846e-02  -6.253 4.12e-10 ***
## regionStLouis             -1.306e-01  1.846e-02  -7.077 1.52e-12 ***
## regionSyracuse            -4.071e-02  1.846e-02  -2.206 0.027420 *  
## regionTampa               -1.524e-01  1.846e-02  -8.258  < 2e-16 ***
## regionTotalUS             -2.666e-01  1.998e-02 -13.348  < 2e-16 ***
## regionWest                -2.897e-01  1.846e-02 -15.696  < 2e-16 ***
## regionWestTexNewMexico    -2.969e-01  1.850e-02 -16.051  < 2e-16 ***
## quarter2                   2.117e-02  9.056e-03   2.338 0.019420 *  
## quarter3                   8.279e-02  9.056e-03   9.142  < 2e-16 ***
## quarter4                  -1.080e-02  9.058e-03  -1.192 0.233314    
## year2016                  -1.186e-01  9.059e-03 -13.097  < 2e-16 ***
## year2017                  -5.756e-02  9.061e-03  -6.352 2.17e-10 ***
## year2018                  -6.568e-03  9.262e-03  -0.709 0.478278    
## x_large_bags               3.887e-07  1.206e-07   3.222 0.001273 ** 
## quarter2:year2016         -2.921e-02  1.281e-02  -2.281 0.022572 *  
## quarter3:year2016          9.430e-02  1.281e-02   7.362 1.89e-13 ***
## quarter4:year2016          2.576e-01  1.281e-02  20.108  < 2e-16 ***
## quarter2:year2017          2.074e-01  1.281e-02  16.187  < 2e-16 ***
## quarter3:year2017          3.116e-01  1.281e-02  24.323  < 2e-16 ***
## quarter4:year2017          2.620e-01  1.270e-02  20.641  < 2e-16 ***
## quarter2:year2018                 NA         NA      NA       NA    
## quarter3:year2018                 NA         NA      NA       NA    
## quarter4:year2018                 NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2399 on 18181 degrees of freedom
## Multiple R-squared:  0.6463, Adjusted R-squared:  0.645 
## F-statistic: 495.8 on 67 and 18181 DF,  p-value: < 2.2e-16
model5pi <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pi)

summary(model5pi)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0362 -0.1455 -0.0045  0.1442  1.4394 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.167e+00  1.429e-02  81.659  < 2e-16 ***
## typeorganic                4.981e-01  3.765e-03 132.295  < 2e-16 ***
## regionAtlanta             -2.233e-01  1.909e-02 -11.698  < 2e-16 ***
## regionBaltimoreWashington -2.698e-02  1.909e-02  -1.413 0.157567    
## regionBoise               -2.129e-01  1.909e-02 -11.150  < 2e-16 ***
## regionBoston              -3.019e-02  1.909e-02  -1.581 0.113792    
## regionBuffaloRochester    -4.424e-02  1.909e-02  -2.317 0.020490 *  
## regionCalifornia          -1.710e-01  1.923e-02  -8.894  < 2e-16 ***
## regionCharlotte            4.497e-02  1.909e-02   2.356 0.018507 *  
## regionChicago             -4.605e-03  1.909e-02  -0.241 0.809395    
## regionCincinnatiDayton    -3.521e-01  1.909e-02 -18.440  < 2e-16 ***
## regionColumbus            -3.084e-01  1.909e-02 -16.156  < 2e-16 ***
## regionDallasFtWorth       -4.758e-01  1.909e-02 -24.922  < 2e-16 ***
## regionDenver              -3.425e-01  1.909e-02 -17.938  < 2e-16 ***
## regionDetroit             -2.866e-01  1.910e-02 -15.004  < 2e-16 ***
## regionGrandRapids         -5.683e-02  1.909e-02  -2.977 0.002919 ** 
## regionGreatLakes          -2.290e-01  1.926e-02 -11.892  < 2e-16 ***
## regionHarrisburgScranton  -4.787e-02  1.909e-02  -2.507 0.012170 *  
## regionHartfordSpringfield  2.576e-01  1.909e-02  13.491  < 2e-16 ***
## regionHouston             -5.134e-01  1.909e-02 -26.891  < 2e-16 ***
## regionIndianapolis        -2.473e-01  1.909e-02 -12.953  < 2e-16 ***
## regionJacksonville        -5.016e-02  1.909e-02  -2.627 0.008616 ** 
## regionLasVegas            -1.801e-01  1.909e-02  -9.433  < 2e-16 ***
## regionLosAngeles          -3.492e-01  1.917e-02 -18.212  < 2e-16 ***
## regionLouisville          -2.744e-01  1.909e-02 -14.374  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.909e-02  -6.958 3.57e-12 ***
## regionMidsouth            -1.577e-01  1.910e-02  -8.258  < 2e-16 ***
## regionNashville           -3.490e-01  1.909e-02 -18.281  < 2e-16 ***
## regionNewOrleansMobile    -2.567e-01  1.909e-02 -13.447  < 2e-16 ***
## regionNewYork              1.662e-01  1.909e-02   8.705  < 2e-16 ***
## regionNortheast            3.951e-02  1.910e-02   2.069 0.038564 *  
## regionNorthernNewEngland  -8.371e-02  1.909e-02  -4.385 1.17e-05 ***
## regionOrlando             -5.505e-02  1.909e-02  -2.883 0.003941 ** 
## regionPhiladelphia         7.103e-02  1.909e-02   3.721 0.000199 ***
## regionPhoenixTucson       -3.367e-01  1.909e-02 -17.636  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.909e-02 -10.305  < 2e-16 ***
## regionPlains              -1.257e-01  1.910e-02  -6.582 4.78e-11 ***
## regionPortland            -2.433e-01  1.909e-02 -12.746  < 2e-16 ***
## regionRaleighGreensboro   -5.973e-03  1.909e-02  -0.313 0.754386    
## regionRichmondNorfolk     -2.698e-01  1.909e-02 -14.132  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.909e-02 -16.402  < 2e-16 ***
## regionSacramento           6.037e-02  1.909e-02   3.162 0.001568 ** 
## regionSanDiego            -1.630e-01  1.909e-02  -8.536  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.909e-02  12.728  < 2e-16 ***
## regionSeattle             -1.185e-01  1.909e-02  -6.206 5.55e-10 ***
## regionSouthCarolina       -1.580e-01  1.909e-02  -8.273  < 2e-16 ***
## regionSouthCentral        -4.624e-01  1.912e-02 -24.183  < 2e-16 ***
## regionSoutheast           -1.657e-01  1.911e-02  -8.671  < 2e-16 ***
## regionSpokane             -1.154e-01  1.909e-02  -6.045 1.53e-09 ***
## regionStLouis             -1.306e-01  1.909e-02  -6.841 8.09e-12 ***
## regionSyracuse            -4.071e-02  1.909e-02  -2.132 0.032995 *  
## regionTampa               -1.524e-01  1.909e-02  -7.983 1.52e-15 ***
## regionTotalUS             -2.643e-01  2.085e-02 -12.677  < 2e-16 ***
## regionWest                -2.896e-01  1.910e-02 -15.166  < 2e-16 ***
## regionWestTexNewMexico    -2.969e-01  1.913e-02 -15.516  < 2e-16 ***
## quarter2                   8.023e-02  5.472e-03  14.661  < 2e-16 ***
## quarter3                   2.180e-01  5.470e-03  39.862  < 2e-16 ***
## quarter4                   1.620e-01  5.440e-03  29.780  < 2e-16 ***
## year2016                  -3.793e-02  4.696e-03  -8.079 6.94e-16 ***
## year2017                   1.375e-01  4.681e-03  29.369  < 2e-16 ***
## year2018                   8.566e-02  8.383e-03  10.219  < 2e-16 ***
## x_large_bags               2.976e-07  2.196e-07   1.355 0.175445    
## quarter2:x_large_bags      1.247e-07  2.852e-07   0.437 0.661831    
## quarter3:x_large_bags      5.626e-08  2.654e-07   0.212 0.832142    
## quarter4:x_large_bags      2.886e-08  4.411e-07   0.065 0.947840    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2482 on 18184 degrees of freedom
## Multiple R-squared:  0.6214, Adjusted R-squared:  0.6201 
## F-statistic: 466.4 on 64 and 18184 DF,  p-value: < 2.2e-16
model5pj <- lm(average_price ~ type + region + quarter + year + x_large_bags + year:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pj)

summary(model5pj)
## 
## Call:
## lm(formula = average_price ~ type + region + quarter + year + 
##     x_large_bags + year:x_large_bags, data = trimmed_avocados)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03659 -0.14579 -0.00433  0.14385  1.44061 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                1.168e+00  1.429e-02  81.749  < 2e-16 ***
## typeorganic                4.974e-01  3.764e-03 132.154  < 2e-16 ***
## regionAtlanta             -2.234e-01  1.909e-02 -11.704  < 2e-16 ***
## regionBaltimoreWashington -2.699e-02  1.909e-02  -1.414 0.157396    
## regionBoise               -2.129e-01  1.909e-02 -11.152  < 2e-16 ***
## regionBoston              -3.020e-02  1.909e-02  -1.582 0.113661    
## regionBuffaloRochester    -4.425e-02  1.909e-02  -2.318 0.020447 *  
## regionCalifornia          -1.711e-01  1.919e-02  -8.919  < 2e-16 ***
## regionCharlotte            4.496e-02  1.909e-02   2.356 0.018508 *  
## regionChicago             -4.466e-03  1.909e-02  -0.234 0.815010    
## regionCincinnatiDayton    -3.516e-01  1.909e-02 -18.420  < 2e-16 ***
## regionColumbus            -3.080e-01  1.909e-02 -16.136  < 2e-16 ***
## regionDallasFtWorth       -4.756e-01  1.909e-02 -24.916  < 2e-16 ***
## regionDenver              -3.424e-01  1.909e-02 -17.941  < 2e-16 ***
## regionDetroit             -2.846e-01  1.911e-02 -14.895  < 2e-16 ***
## regionGrandRapids         -5.664e-02  1.909e-02  -2.967 0.003014 ** 
## regionGreatLakes          -2.233e-01  1.934e-02 -11.543  < 2e-16 ***
## regionHarrisburgScranton  -4.784e-02  1.909e-02  -2.506 0.012213 *  
## regionHartfordSpringfield  2.576e-01  1.909e-02  13.494  < 2e-16 ***
## regionHouston             -5.131e-01  1.909e-02 -26.880  < 2e-16 ***
## regionIndianapolis        -2.468e-01  1.909e-02 -12.931  < 2e-16 ***
## regionJacksonville        -5.016e-02  1.909e-02  -2.628 0.008599 ** 
## regionLasVegas            -1.801e-01  1.909e-02  -9.435  < 2e-16 ***
## regionLosAngeles          -3.490e-01  1.915e-02 -18.229  < 2e-16 ***
## regionLouisville          -2.742e-01  1.909e-02 -14.368  < 2e-16 ***
## regionMiamiFtLauderdale   -1.328e-01  1.909e-02  -6.959 3.55e-12 ***
## regionMidsouth            -1.574e-01  1.909e-02  -8.244  < 2e-16 ***
## regionNashville           -3.490e-01  1.909e-02 -18.284  < 2e-16 ***
## regionNewOrleansMobile    -2.568e-01  1.909e-02 -13.454  < 2e-16 ***
## regionNewYork              1.661e-01  1.909e-02   8.703  < 2e-16 ***
## regionNortheast            3.946e-02  1.909e-02   2.067 0.038749 *  
## regionNorthernNewEngland  -8.372e-02  1.909e-02  -4.386 1.16e-05 ***
## regionOrlando             -5.503e-02  1.909e-02  -2.883 0.003940 ** 
## regionPhiladelphia         7.103e-02  1.909e-02   3.721 0.000199 ***
## regionPhoenixTucson       -3.367e-01  1.909e-02 -17.642  < 2e-16 ***
## regionPittsburgh          -1.967e-01  1.909e-02 -10.307  < 2e-16 ***
## regionPlains              -1.253e-01  1.909e-02  -6.565 5.33e-11 ***
## regionPortland            -2.433e-01  1.909e-02 -12.745  < 2e-16 ***
## regionRaleighGreensboro   -5.975e-03  1.909e-02  -0.313 0.754243    
## regionRichmondNorfolk     -2.698e-01  1.909e-02 -14.135  < 2e-16 ***
## regionRoanoke             -3.131e-01  1.909e-02 -16.406  < 2e-16 ***
## regionSacramento           6.033e-02  1.909e-02   3.161 0.001576 ** 
## regionSanDiego            -1.630e-01  1.909e-02  -8.539  < 2e-16 ***
## regionSanFrancisco         2.430e-01  1.909e-02  12.729  < 2e-16 ***
## regionSeattle             -1.185e-01  1.909e-02  -6.207 5.53e-10 ***
## regionSouthCarolina       -1.580e-01  1.909e-02  -8.278  < 2e-16 ***
## regionSouthCentral        -4.616e-01  1.911e-02 -24.147  < 2e-16 ***
## regionSoutheast           -1.660e-01  1.911e-02  -8.687  < 2e-16 ***
## regionSpokane             -1.154e-01  1.909e-02  -6.046 1.51e-09 ***
## regionStLouis             -1.306e-01  1.909e-02  -6.842 8.07e-12 ***
## regionSyracuse            -4.071e-02  1.909e-02  -2.133 0.032948 *  
## regionTampa               -1.524e-01  1.909e-02  -7.984 1.50e-15 ***
## regionTotalUS             -2.574e-01  2.084e-02 -12.350  < 2e-16 ***
## regionWest                -2.895e-01  1.909e-02 -15.163  < 2e-16 ***
## regionWestTexNewMexico    -2.967e-01  1.913e-02 -15.509  < 2e-16 ***
## quarter2                   8.056e-02  5.411e-03  14.887  < 2e-16 ***
## quarter3                   2.183e-01  5.415e-03  40.325  < 2e-16 ***
## quarter4                   1.626e-01  5.378e-03  30.244  < 2e-16 ***
## year2016                  -3.908e-02  4.749e-03  -8.229  < 2e-16 ***
## year2017                   1.354e-01  4.739e-03  28.580  < 2e-16 ***
## year2018                   8.441e-02  8.477e-03   9.957  < 2e-16 ***
## x_large_bags              -1.140e-06  5.468e-07  -2.085 0.037091 *  
## year2016:x_large_bags      1.419e-06  5.571e-07   2.547 0.010880 *  
## year2017:x_large_bags      1.642e-06  5.537e-07   2.966 0.003023 ** 
## year2018:x_large_bags      1.461e-06  5.948e-07   2.456 0.014054 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2481 on 18184 degrees of freedom
## Multiple R-squared:  0.6216, Adjusted R-squared:  0.6203 
## F-statistic: 466.8 on 64 and 18184 DF,  p-value: < 2.2e-16

So it looks like model5pa with the type, region, quarter, year, x_large_bags and type:region is the best, with a moderate gain in multiple-\(r^2\) due to the interaction. However, we need to test for the significance of the interaction given the various \(p\)-values of the associated coefficients

anova(model5, model5pa)

Neat, it looks like including the interaction is statistically justified.

1.7 Automated approach

Let’s try to fit a predictive model using glmulti()

library(glmulti)
## Loading required package: rJava

1.7.1 Train-test split:

This data is pretty big for glmulti on a single CPU core, so we’ll likely not be able to do a search simultaneously for both main effects and pairwise interactions. Let’s look first for the best main effects model using BIC as our metric:

# we're putting set.seed() in here for reproducibility, but you shouldn't include
# this in production code
set.seed(42)
n_data <- nrow(trimmed_avocados)
test_index <- sample(1:n_data, size = n_data * 0.2)

test  <- slice(trimmed_avocados, test_index)
train <- slice(trimmed_avocados, -test_index)

# sanity check
nrow(test) + nrow(train) == n_data
## [1] TRUE
nrow(test)
## [1] 3649
nrow(train)
## [1] 14600
glmulti_fit <- glmulti(
  average_price ~ ., 
  data = train,
  level = 1, # 2 = include pairwise interactions, 1 = main effects only (main effect = no pairwise interactions)
  minsize = 1, # no min size of model
  maxsize = -1, # -1 = no max size of model
  marginality = TRUE, # marginality here means the same as 'strongly hierarchical' interactions, i.e. include pairwise interactions only if both predictors present in the model as main effects.
  method = "h", # try exhaustive search, or could use "g" for genetic algorithm instead
  crit = bic, # criteria for model selection is BIC value (lower is better)
  plotty = FALSE, # don't plot models as function runs
  report = TRUE, # do produce reports as function runs
  confsetsize = 10, # return best 10 solutions
  fitfunction = lm # fit using the `lm` function
)
## Initialization...
## TASK: Exhaustive screening of candidate set.
## Fitting...
## 
## After 50 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags
## Crit= 14290.9309630974
## Mean crit= 14302.7404848229
## 
## After 100 models:
## Best model: average_price~1+total_volume+x4046+x4770+large_bags
## Crit= 14287.0201451487
## Mean crit= 14295.0164087599
## 
## After 150 models:
## Best model: average_price~1+x4046+x4225+x4770+x_large_bags
## Crit= 14282.9391871136
## Mean crit= 14288.655212267
## 
## After 200 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 14279.4193694914
## Mean crit= 14287.2170591254
## 
## After 250 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 14279.4193694914
## Mean crit= 14285.8311251354
## 
## After 300 models:
## Best model: average_price~1+x4225+region
## Crit= 11937.9201055248
## Mean crit= 11948.2560227565
## 
## After 350 models:
## Best model: average_price~1+x4225+region
## Crit= 11937.9201055248
## Mean crit= 11946.7914993621
## 
## After 400 models:
## Best model: average_price~1+total_volume+x4046+x_large_bags+region
## Crit= 11925.5711979638
## Mean crit= 11936.6091736735
## 
## After 450 models:
## Best model: average_price~1+total_volume+x4225+x_large_bags+region
## Crit= 11921.550556659
## Mean crit= 11926.6655997315
## 
## After 500 models:
## Best model: average_price~1+total_volume+x4225+x_large_bags+region
## Crit= 11921.550556659
## Mean crit= 11926.6075265317
## 
## After 550 models:
## Best model: average_price~1+type+x4046+x4225+x4770
## Crit= 7734.8593327961
## Mean crit= 7793.00094370359
## 
## After 600 models:
## Best model: average_price~1+type+total_volume+x4225+x4770+small_bags
## Crit= 7697.79186941605
## Mean crit= 7707.77632276868
## 
## After 650 models:
## Best model: average_price~1+type+total_volume+x4046+x4225+x4770+small_bags+large_bags
## Crit= 7665.37294598611
## Mean crit= 7691.99478130502
## 
## After 700 models:
## Best model: average_price~1+type+total_volume+x4046+x4225+total_bags+small_bags+large_bags
## Crit= 7665.32155031575
## Mean crit= 7671.67032729922
## 
## After 750 models:
## Best model: average_price~1+type+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 7657.97738881932
## Mean crit= 7664.22564265621
## 
## After 800 models:
## Best model: average_price~1+type+total_volume+x4225+region
## Crit= 3977.52101108293
## Mean crit= 5088.37870955926
## 
## After 850 models:
## Best model: average_price~1+type+total_volume+small_bags+region
## Crit= 3964.67515907674
## Mean crit= 3970.85743694413
## 
## After 900 models:
## Best model: average_price~1+type+total_volume+small_bags+region
## Crit= 3964.67515907674
## Mean crit= 3969.31227685631
## 
## After 950 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3925.01550491875
## 
## After 1000 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.49480426776
## 
## After 1050 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1100 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1150 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1200 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1250 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1300 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1350 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1400 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1450 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1500 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1550 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1600 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1650 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1700 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1750 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1800 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1850 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
## 
## After 1900 models:
## Best model: average_price~1+type+year+x4225+region
## Crit= 2782.77393470439
## Mean crit= 2786.76963771313
## 
## After 1950 models:
## Best model: average_price~1+type+year+x4225+region
## Crit= 2782.77393470439
## Mean crit= 2785.83493132762
## 
## After 2000 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2749.79339051662
## 
## After 2050 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2747.16085350457
## 
## After 2100 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2150 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2200 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2250 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2300 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2350 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2400 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2450 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2500 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2550 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2600 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2650 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2700 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2750 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2800 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2850 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2900 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
## 
## After 2950 models:
## Best model: average_price~1+type+quarter+x4225+small_bags+region
## Crit= 2606.2768667676
## Mean crit= 2614.89356850007
## 
## After 3000 models:
## Best model: average_price~1+type+quarter+x4225+small_bags+region
## Crit= 2606.2768667676
## Mean crit= 2612.22085451977
## 
## After 3050 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2593.56894619082
## 
## After 3100 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.63505461122
## 
## After 3150 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3200 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3250 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3300 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3350 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3400 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3450 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3500 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3550 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3600 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3650 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3700 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3750 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3800 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3850 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3900 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 3950 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
## 
## After 4000 models:
## Best model: average_price~1+type+year+quarter+x4225+x4770+region
## Crit= 1373.42931760425
## Mean crit= 1379.26585957081
## 
## After 4050 models:
## Best model: average_price~1+type+year+quarter+x4225+x4770+region
## Crit= 1373.42931760425
## Mean crit= 1377.67893634087
## 
## After 4100 models:
## Best model: average_price~1+type+year+quarter+total_bags+small_bags+large_bags+region
## Crit= 1364.21555034686
## Mean crit= 1371.92259424323
## 
## After 4150 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1360.09536364159
## 
## After 4200 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1358.68681119117
## 
## After 4250 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1358.68681119117
## Completed.
summary(glmulti_fit)
## $name
## [1] "glmulti.analysis"
## 
## $method
## [1] "h"
## 
## $fitting
## [1] "lm"
## 
## $crit
## [1] "bic"
## 
## $level
## [1] 1
## 
## $marginality
## [1] TRUE
## 
## $confsetsize
## [1] 10
## 
## $bestic
## [1] 1355.82
## 
## $icvalues
##  [1] 1355.820 1356.338 1356.942 1358.332 1359.273 1359.330 1359.344 1360.079
##  [9] 1360.619 1360.791
## 
## $bestmodel
## [1] "average_price ~ 1 + type + year + quarter + total_volume + x_large_bags + "
## [2] "    region"                                                                
## 
## $modelweights
##  [1] 0.29050767 0.22415209 0.16577867 0.08272938 0.05166463 0.05021337
##  [7] 0.04988139 0.03452781 0.02636050 0.02418450
## 
## $includeobjects
## [1] TRUE

So the lowest BIC model with main effects is average_price ~ type + year + quarter + total_volume + x_large_bags + region. Let’s have a look at possible extensions to this. We’re going to deliberately try to go to the point where models start to overfit (as tested by the RMSE on the test set), so we’ve seen what this looks like.

results <- tibble(
  name = c(), bic = c(), rmse_train = c(), rmse_test = c()
)
# lowest BIC model with main effects
lowest_bic_model <- lm(average_price ~ type + year + quarter + total_volume + x_large_bags + region, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "lowest bic", 
      bic = bic(lowest_bic_model),
      rmse_train = rmse(lowest_bic_model, train),
      rmse_test = rmse(lowest_bic_model, test)
    )
  )

# try adding in all possible pairs with these main effects
lowest_bic_model_all_pairs <- lm(average_price ~ (type + year + quarter + total_volume + x_large_bags + region)^2, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "lowest bic all pairs", 
      bic = bic(lowest_bic_model_all_pairs),
      rmse_train = rmse(lowest_bic_model_all_pairs, train),
      rmse_test = rmse(lowest_bic_model_all_pairs, test)
    )
  )
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading

## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects
model_all_mains <- lm(average_price ~ ., data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all mains", 
      bic = bic(model_all_mains),
      rmse_train = rmse(model_all_mains, train),
      rmse_test = rmse(model_all_mains, test)
    )
  )

# try a model with all main effects and all pairs
model_all_pairs <- lm(average_price ~ .^2, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs", 
      bic = bic(model_all_pairs),
      rmse_train = rmse(model_all_pairs, train),
      rmse_test = rmse(model_all_pairs, test)
    )
  )
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading

## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects, all pairs and one triple (this is getting silly)
model_all_pairs_one_triple <- lm(average_price ~ .^2 + region:type:year, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs one triple",
      bic = bic(model_all_pairs_one_triple),
      rmse_train = rmse(model_all_pairs_one_triple, train),
      rmse_test = rmse(model_all_pairs_one_triple, test)
    )
  )
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading

## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects, all pairs and multiple triples (more silly)
model_all_pairs_multi_triples <- lm(average_price ~ .^2 + region:type:year + region:type:quarter + region:year:quarter, data = train)
results <- results %>%
  add_row(
    tibble_row(
      name = "all pairs multi triples",
      bic = bic(model_all_pairs_multi_triples),
      rmse_train = rmse(model_all_pairs_multi_triples, train),
      rmse_test = rmse(model_all_pairs_multi_triples, test)
    )
  )
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading

## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
results <- results %>%
  pivot_longer(cols = bic:rmse_test, names_to = "measure", values_to = "value") %>%
  mutate(
    name = fct_relevel(
      as_factor(name),
      "lowest bic", "all mains", "lowest bic all pairs", "all pairs", "all pairs one triple", "all pairs multi triples"
    )
  )
results %>%
  filter(measure == "bic") %>%
  ggplot(aes(x = name, y = value)) +
  geom_col(fill = "steelblue", alpha = 0.7) +
  labs(
    x = "model",
    y = "bic"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  geom_hline(aes(yintercept = 0))

BIC is telling us here that if we took our main effects model with lowest BIC, and added in all possible pairs, this would likely still improve the model for predictive purposes. BIC suggests that this ‘lowest BIC all pairs’ model will offer best predictive performance without overfitting, with all other models being significantly poorer.

Let’s compare the RMSE values of the various models for train and test sets. We expect train RMSE always to go down as model complexity increases, but what happens to the test RMSE as models get more complex?

results %>%
  filter(measure != "bic") %>%
  ggplot(aes(x = name, y = value, fill = measure)) +
  geom_col(position = "dodge", alpha = 0.7) +
  labs(
    x = "model",
    y = "rmse"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Lowest RMSE in test is obtained for the ‘lowest bic all pairs’ model, and it increases thereafter for the more complex models, which suggests that these models are overfitting the training data.